Learn Python Concurrency and Parallelism
TLDR.
Use multi-threading for IO/Network bound tasks. (bc python GIL).
Use multi-processing for heavy CPU bounds
By default use library ‘concurrency.future’, it has timeout and exception handling feature.
Speak of experience
. when an error occurs, 95% of the time you need to fix your code and rerun things.
. when you have to run lots and lots of small cpu-bound tasks, you can try break down those tasks into chunks then feed each process a chunk, instead of creating a new process each small tasks
. when you crawl data, it is recommended to use thread,
And need to create a basic error handling workflow.
Python global interpreter lock:
For a python process, it is not possible to achieve parallelism via multi-threading.
Deadlock:
two/more tasks holding the lock which is required for other tasks to complete. And the lock-holders don’t release the lock bc they are missing one/more locks.
Race condition:
Parallel tasks alter the value of a variable in the wrong order. (giving certain variable a lock can help).