Introduction to multithreading, multiprocessing and async¶
Martelli’s model of scalability¶
Number of cores |
Description |
---|---|
1 |
Single thread and single process |
2–8 |
Multiple threads and multiple processes |
>8 |
Distributed processing |
Martelli’s observation was that over time the second category becomes less and less important as individual cores become more powerful and large data sets become larger.
Global Interpreter Lock (GIL)¶
CPython has a lock on its internally shared global state. As a result, no more than one thread can run at the same time.
The GIL is not a big problem for I/O-heavy applications; however, using threading will slow down CPU-heavy applications. Accordingly, multi-processing is exciting for us to get more CPU cycles.
Literate programming and Martelli’s model of scalability determined the design decisions on Python’s performance for a long time. Little has changed in this assessment to this day: Contrary to intuitive expectations, more CPUs and threads in Python initially lead to less efficient applications. However, the Gilectomy project, which was supposed to replace the GIL, also encountered another problem: the Python C API exposes too many implementation details. With this, however, performance improvements would quickly lead to incompatible changes, which then seem unacceptable, especially in a language as popular as Python.
Overview¶
Criterion |
Multithreading |
Multiprocessing |
asyncio |
---|---|---|---|
Separation |
Threads share one state. However, sharing a state can lead to race conditions, i.e. the result of an operation can depend on the timing of certain individual operations. |
The processes are independent of each other. If they are to communicate with each other, interprocess communication (IPC), object pickling and other overhead is necessary. |
With
Almost all |
Switch |
Threads change preemptively, i.e. no explicit code needs to be added to cause a change of tasks. However, such a
change is
possible at any
time;
accordingly,
critical areas
must be protected
with |
As soon as you get a process assigned, significant progress should be made. So you should not make too many roundtrips back and forth. |
|
Tooling |
Threads require very little tooling: Lock and Queue. Locks are
difficult to
understand in
non-trivial
examples. For
complex
applications, it
is therefore
better to use
atomic message
queues or
|
Simple tooling with map and imap_unordered among others, to test individual processes in a single thread before switching to multiprocessing. If IPC or object pickling is used, the tooling becomes more complex. |
At least for complex systems,
However |
Performance |
Multithreading produces the best results for IO-heavy tasks. The performance limit for threads is one CPU minus task switches and synchronisation overheads. |
The processes can be distributed to several CPUs and should therefore be used for CPU-heavy tasks. However, additional effort may be required and synchronisation of the processes. |
Calling a poor Python function
takes more overhead than
requesting a For CPU-intensive tasks, however, multiprocessing is more suitable. |
Summary¶
There is no one ideal implementation of concurrency – each of the approaches presented next has specific advantages and disadvantages. So before you decide which approach to follow, you should analyse the performance problems carefully and then choose the most suitable solution. In our projects, we often use several approaches, depending on the part for which the performance is to be optimised.