Introduction to multithreading, multiprocessing and async

Martelli’s model of scalability

Number of cores



Single thread and single process


Multiple threads and multiple processes


Distributed processing

Martelli’s observation was that over time the second category becomes less and less important as individual cores become more powerful and large data sets become larger.

Global Interpreter Lock (GIL)

CPython has a lock on its internally shared global state. As a result, no more than one thread can run at the same time.

The GIL is not a big problem for I/O-heavy applications; however, using threading will slow down CPU-heavy applications. Accordingly, multi-processing is exciting for us to get more CPU cycles.

Literate programming and Martelli’s model of scalability determined the design decisions on Python’s performance for a long time. Little has changed in this assessment to this day: Contrary to intuitive expectations, more CPUs and threads in Python initially lead to less efficient applications. However, the Gilectomy project, which was supposed to replace the GIL, also encountered another problem: the Python C API exposes too many implementation details. With this, however, performance improvements would quickly lead to incompatible changes, which then seem unacceptable, especially in a language as popular as Python.







Threads share one state.

However, sharing a state can lead to race conditions, i.e. the result of an operation can depend on the timing of certain individual operations.

The processes are independent of each other.

If they are to communicate with each other, interprocess communication (IPC), object pickling and other overhead is necessary.

With run_coroutine_threadsafe(), asyncio objects can also be used by other threads.

Almost all asyncio objects are not thread-safe.


Threads change preemptively, i.e. no explicit code needs to be added to cause a change of tasks.

However, such a change is possible at any time; accordingly, critical areas must be protected with lock.

As soon as you get a process assigned, significant progress should be made. So you should not make too many roundtrips back and forth.

asyncio switches cooperatively, i.e. yield or await must be explicitly specified to cause a switch. You can therefore keep the effort to these changes very low.


Threads require very little tooling: Lock and Queue.

Locks are difficult to understand in non-trivial examples. For complex applications, it is therefore better to use atomic message queues or asyncio.

Simple tooling with map and imap_unordered among others, to test individual processes in a single thread before switching to multiprocessing.

If IPC or object pickling is used, the tooling becomes more complex.

At least for complex systems, asyncio leads to the goal more easy than multithreading locks.

However asyncio requires a large set of tools: futures, Event Loops and non-blocking versions of almost everything.


Multithreading produces the best results for IO-heavy tasks.

The performance limit for threads is one CPU minus task switches and synchronisation overheads.

The processes can be distributed to several CPUs and should therefore be used for CPU-heavy tasks.

However, additional effort may be required and synchronisation of the processes.

Calling a poor Python function takes more overhead than requesting a generator or awaitable – i.e., asyncio can utilise the CPU efficiently.

For CPU-intensive tasks, however, multiprocessing is more suitable.


There is no one ideal implementation of concurrency – each of the approaches presented next has specific advantages and disadvantages. So before you decide which approach to follow, you should analyse the performance problems carefully and then choose the most suitable solution. In our projects, we often use several approaches, depending on the part for which the performance is to be optimised.