Перейти к содержанию

Concurrency and Parallelism

Documentation

The appropriate choice of tool will depend on the task to be executed (CPU bound vs I/O bound) and preferred style of development (event driven cooperative multitasking vs preemptive multitasking).

Preferred Task Type Concurrency Type Switching Decision Number of Processors Python Module
I/O bound Pre-emptive multitasking The operating system decides when to switch tasks external to Python.

Due to the GIL, only one thread can execute Python code at once.
1 threading
I/O bound Cooperative multitasking The tasks decide when to give up control. 1 asyncio
CPU bound Multiprocessing The processes all run at the same time on different processors.

Each process has its own GIL, so processes do not lock each other out like threads.
Many multiprocessing

Info

Threading is for working in parallel, and async is for waiting in parallel:-)

Global Interpreter Lock (GIL)

Documentation

The mechanism used by the CPython interpreter to assure that only one thread executes Python bytecode at a time. This means multiple threads can lock each other out.

Info

The GIL in cPython does not protect your program state. It protects the interpreter's state.

Using external calls to C code, like NumPy, SciPy and pandas methods means threads will release the GIL while they run these methods.

Concurrency

Preemptive multitasking

Documentation

The operating system actually knows about each thread and can interrupt it at any time to start running a different thread. This is called preemptive multitasking since the operating system can pre-empt your thread to make the switch.

Preemptive multitasking is handy in that the code in the thread doesn’t need to do anything to make the switch. It can also be difficult because of that «at any time» phrase. This switch can happen in the middle of a single Python statement, even a trivial one like x = x + 1.

Cooperative multitasking

Documentation

The tasks must cooperate by announcing when they are ready to be switched out. That means that the code in the task has to change slightly to make this happen.

The benefit of doing this extra work up front is that you always know where your task will be swapped out. It will not be swapped out in the middle of a Python statement unless that statement is marked.

threading — Thread-based parallelism

Uses preemptive multitasking. Run on a single processor and therefore only run one at a time.

threading — interface to OS-level threads. Note that CPU-bound work is mostly serialized by the GIL, so don't expect speedup in your calculations. Use it when you need to invoke blocking APIs in parallel, and when you require precise control over thread creation.

Avoid creating too many threads (e.g. thousands), as they are not free. If possible, don't create threads yourself, use concurrent.futures instead.

asyncio

Uses cooperative multitasking.

Uses a single thread of execution and async system calls across the board. It has no blocking calls at all, the only blocking part being the asyncio.run() entry point.

asyncio code is typically written using coroutines, which use await to suspend until something interesting happens. Suspending is different than blocking in that it allows the event loop thread to continue to other things while you're waiting. It has many advantages compared to thread-based solutions, such as being able to spawn thousands of cheap «tasks» without bogging down the system, and being able to cancel tasks or easily wait for multiple things at once.

Info

asyncio should be the tool of choice for servers and for clients connecting to multiple servers.

asyncio can await functions executed in thread or process pools provided by concurrent.futures, so it can serve as glue between all those different models.

Parallelism

multiprocessing — Process-based parallelism

With multiprocessing, Python creates new processes. A process here can be thought of as almost a completely different program, though technically they’re usually defined as a collection of resources where the resources include memory, file handles and things like that. One way to think about it is that each process runs in its own Python interpreter.

Multiple processes work in parallel, so you can actually speed up calculations using this method. The disadvantage is that you can't share in-memory datastructures without using multi-processing specific tools.

concurrent.futures — Launching parallel tasks

A modern interface to threading and multiprocessing, which provides convenient thread/process pools it calls executors.

The asynchronous execution can be performed with threads, using ThreadPoolExecutor or separate processes, using ProcessPoolExecutor.

Info

concurrent.futures should be the tool of choice when considering thread or process based parallelism.

Processes & Threads

Processes:

  1. Independent instances of programs running on specific CPU cores.
  2. Processes add overhead to creating and managing them.
  3. Each process works with its own memory area.
  4. Complex mechanisms of communication between processes.
  5. Suitable for running routines with different operation logic and for parallel execution of CPU-bound tasks.

Threads:

  1. "Cheap" execution units within the process.
  2. Share the memory area with the parent process.
  3. Simple communication between threads.
  4. Parallelism within a single (sub) program.
  5. Threads handle IO-bound tasks well.
  6. The GIL affects how threads operate and their performance.