iPython Profiler¶
IPython provides access to a wide range of functions to measure times and create profiles. The following magic IPython commands are explained here:
Command |
Description |
|---|---|
|
Time to execute a single statement |
|
Average time it took to execute a single statement repeatedly |
|
Run code with the profiler |
|
Run code with the line-by-line profiler |
|
Measure the memory usage of a single statement |
|
Executes the code with the line-by-line memory profiler |
The last four commands are not contained in IPython itself, but in the modules line_profiler and memory_profiler.
See also:
%timeit and %time¶
We saw the %timeit line and %%timeit cell magic in the introduction of the magic functions in IPython magic commands. They can be used to measure the timing of the repeated execution of code snippets:
[1]:
%timeit sum(range(100))
320 ns ± 1.66 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Note that %timeit executes the execution multiple times in a loop. If the number of loops is not specified with -n, %timeit automatically adjusts the number so that sufficient measurement accuracy is achieved:
[2]:
%%timeit
total = 0
for i in range(1000):
for j in range(1000):
total += i * (-1) ** j
71.2 ms ± 826 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Sometimes repeating an operation is not the best option, for example when we have a list that we want to sort. Here we may be misled by repeated surgery. Sorting a presorted list is much faster than sorting an unsorted list, so repeating it distorts the result:
[3]:
import random
L = [random.random() for i in range(100000)]
%timeit L.sort()
214 μs ± 2.9 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Then the %time function might be a better choice. %time should also be the better choice for long-running commands, when short system-related delays are unlikely to affect the result:
[4]:
import random
L = [random.random() for i in range(100000)]
%time L.sort()
CPU times: user 15.8 ms, sys: 314 μs, total: 16.1 ms
Wall time: 16.4 ms
Sorting an already sorted list:
[5]:
%time L.sort()
CPU times: user 4.17 ms, sys: 1 μs, total: 4.17 ms
Wall time: 4.18 ms
Note how much faster the pre-sorted list is to be sorted, but also note how much longer the timing with %time takes compared to %timeit, even for the pre-sorted list. This is due to the fact that %timeit is doing some clever things to keep system calls from interfering with the timing. This prevents, for example, the garbage collection of Python objects that are no longer used and that could otherwise affect the time measurement. Because of this, the %timeit results are usually
noticeably faster than the %time results.
Profiling for scripts: %prun¶
A program is made up of many individual instructions, and sometimes it is more important to measure those instructions in context than to measure them yourself. Python includes a built-in Code-Profiler. However, IPython offers a much more convenient way to use this profiler in the form of the magic function %prun.
As an example, let’s define a simple function that does some calculations:
[6]:
def sum_of_lists(N):
total = 0
for i in range(5):
L = [j ^ (j >> i) for j in range(N)]
total += sum(L)
return total
[7]:
%prun sum_of_lists(1000000)
674 function calls (663 primitive calls) in 0.283 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.128 0.128 0.137 0.137 3519952779.py:1(sum_of_lists)
14 0.072 0.005 0.076 0.005 socket.py:626(send)
2 0.033 0.016 0.097 0.048 iostream.py:276(<lambda>)
5 0.017 0.003 0.017 0.003 {built-in method builtins.sum}
1 0.012 0.012 0.012 0.012 {method 'execute' of 'sqlite3.Connection' objects}
2 0.008 0.004 0.008 0.004 {method 'recv' of '_socket.socket' objects}
2/1 0.007 0.003 0.000 0.000 {method 'control' of 'select.kqueue' objects}
1 0.006 0.006 0.143 0.143 <string>:1(<module>)
5 0.000 0.000 0.000 0.000 attrsettr.py:66(_get_attr_opt)
3 0.000 0.000 0.112 0.037 base_events.py:1947(_run_once)
15 0.000 0.000 0.000 0.000 enum.py:1581(__or__)
66 0.000 0.000 0.000 0.000 enum.py:1574(_get_value)
2 0.000 0.000 0.097 0.048 zmqstream.py:583(_handle_events)
2 0.000 0.000 0.059 0.029 socket.py:703(send_multipart)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
143/139 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance}
5 0.000 0.000 0.000 0.000 attrsettr.py:43(__getattr__)
29 0.000 0.000 0.000 0.000 enum.py:691(__call__)
1 0.000 0.000 0.020 0.020 history.py:833(_writeout_input_cache)
2 0.000 0.000 0.000 0.000 socket.py:774(recv_multipart)
1 0.000 0.000 0.000 0.000 inspect.py:3116(_bind)
5/2 0.000 0.000 0.097 0.049 events.py:87(_run)
1 0.000 0.000 0.013 0.013 selector_events.py:129(_read_from_self)
4/2 0.000 0.000 0.097 0.048 {method 'run' of '_contextvars.Context' objects}
29 0.000 0.000 0.000 0.000 enum.py:1145(__new__)
5 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:1390(_handle_fromlist)
1 0.000 0.000 0.020 0.020 history.py:845(writeout_cache)
2/1 0.000 0.000 0.005 0.005 selectors.py:540(select)
7 0.000 0.000 0.000 0.000 enum.py:1592(__and__)
1 0.000 0.000 0.000 0.000 kernelbase.py:302(poll_control_queue)
3 0.000 0.000 0.000 0.000 zmqstream.py:663(_rebuild_io_state)
3 0.000 0.000 0.000 0.000 zmqstream.py:686(_update_handler)
6 0.000 0.000 0.000 0.000 typing.py:426(inner)
3 0.000 0.000 0.000 0.000 queue.py:115(empty)
1 0.000 0.000 0.000 0.000 queues.py:209(put_nowait)
1 0.000 0.000 0.000 0.000 decorator.py:199(fix)
2 0.000 0.000 0.059 0.029 iostream.py:278(_really_send)
2 0.000 0.000 0.000 0.000 {method '__exit__' of 'sqlite3.Connection' objects}
1 0.000 0.000 0.000 0.000 inspect.py:2930(apply_defaults)
2 0.000 0.000 0.097 0.048 iostream.py:157(_handle_event)
6 0.000 0.000 0.000 0.000 traitlets.py:676(__get__)
1 0.000 0.000 0.000 0.000 traitlets.py:1527(_notify_observers)
3 0.000 0.000 0.000 0.000 base_events.py:848(_call_soon)
1 0.000 0.000 0.000 0.000 queues.py:186(put)
10 0.000 0.000 0.000 0.000 {built-in method builtins.hasattr}
2 0.000 0.000 0.000 0.000 traitlets.py:3631(set)
1 0.000 0.000 0.000 0.000 {method 'send' of '_socket.socket' objects}
1 0.000 0.000 0.020 0.020 history.py:55(only_when_enabled)
2 0.000 0.000 0.000 0.000 traitlets.py:689(set)
2 0.000 0.000 0.000 0.000 traitlets.py:3474(validate)
2 0.000 0.000 0.000 0.000 traitlets.py:718(_validate)
1 0.000 0.000 0.000 0.000 _base.py:537(set_result)
1 0.000 0.000 0.000 0.000 inspect.py:3255(bind)
1 0.000 0.000 0.000 0.000 asyncio.py:225(add_callback)
3 0.000 0.000 0.000 0.000 threading.py:303(__enter__)
1 0.000 0.000 0.143 0.143 {built-in method builtins.exec}
2 0.000 0.000 0.000 0.000 traitlets.py:3624(validate_elements)
4 0.000 0.000 0.000 0.000 base_events.py:768(time)
6 0.000 0.000 0.000 0.000 traitlets.py:629(get)
5 0.000 0.000 0.000 0.000 {built-in method builtins.getattr}
16 0.000 0.000 0.000 0.000 {built-in method builtins.len}
1 0.000 0.000 0.000 0.000 queues.py:225(get)
2 0.000 0.000 0.097 0.048 zmqstream.py:556(_run_callback)
2 0.000 0.000 0.000 0.000 queues.py:322(_consume_expired)
1 0.000 0.000 0.000 0.000 queues.py:173(qsize)
4 0.000 0.000 0.000 0.000 typing.py:1443(__hash__)
1 0.000 0.000 0.000 0.000 concurrent.py:182(future_set_result_unless_cancelled)
6 0.000 0.000 0.000 0.000 {built-in method builtins.next}
2 0.000 0.000 0.000 0.000 traitlets.py:727(_cross_validate)
3 0.000 0.000 0.000 0.000 events.py:36(__init__)
2 0.000 0.000 0.097 0.048 zmqstream.py:624(_handle_recv)
1 0.000 0.000 0.000 0.000 traitlets.py:1512(_notify_trait)
2 0.000 0.000 0.000 0.000 typing.py:1665(__subclasscheck__)
2 0.000 0.000 0.000 0.000 {method 'set_result' of '_asyncio.Future' objects}
2 0.000 0.000 0.000 0.000 iostream.py:213(_is_master_process)
1 0.000 0.000 0.020 0.020 decorator.py:229(fun)
1 0.000 0.000 0.000 0.000 inspect.py:2877(args)
2 0.000 0.000 0.000 0.000 traitlets.py:708(__set__)
2 0.000 0.000 0.000 0.000 traitlets.py:2304(validate)
3 0.000 0.000 0.000 0.000 zmqstream.py:542(sending)
2 0.000 0.000 0.000 0.000 base_events.py:819(call_soon)
7 0.000 0.000 0.000 0.000 {method 'append' of 'collections.deque' objects}
8 0.000 0.000 0.000 0.000 {method 'popleft' of 'collections.deque' objects}
1 0.000 0.000 0.000 0.000 traitlets.py:1523(notify_change)
3 0.000 0.000 0.000 0.000 queue.py:267(_qsize)
3 0.000 0.000 0.000 0.000 selector_events.py:740(_process_events)
24 0.000 0.000 0.000 0.000 typing.py:2366(cast)
1 0.000 0.000 0.000 0.000 futures.py:393(_call_set_state)
5 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}
1 0.000 0.000 0.000 0.000 threading.py:315(_acquire_restore)
2 0.000 0.000 0.000 0.000 iostream.py:216(_check_mp_mode)
1 0.000 0.000 0.097 0.097 asyncio.py:200(_handle_events)
7 0.000 0.000 0.000 0.000 {method '__exit__' of '_thread.lock' objects}
1 0.000 0.000 0.000 0.000 queues.py:317(__put_internal)
1 0.000 0.000 0.000 0.000 history.py:839(_writeout_output_cache)
10 0.000 0.000 0.000 0.000 inspect.py:2787(kind)
5 0.000 0.000 0.000 0.000 {method 'upper' of 'str' objects}
1 0.000 0.000 0.000 0.000 queues.py:312(_put)
2 0.000 0.000 0.000 0.000 {built-in method _abc._abc_subclasscheck}
5 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
3 0.000 0.000 0.000 0.000 threading.py:306(__exit__)
1 0.000 0.000 0.000 0.000 threading.py:312(_release_save)
1 0.000 0.000 0.000 0.000 queues.py:256(get_nowait)
1 0.000 0.000 0.000 0.000 threading.py:428(notify_all)
1 0.000 0.000 0.000 0.000 base_events.py:872(call_soon_threadsafe)
1 0.000 0.000 0.000 0.000 inspect.py:2900(kwargs)
2 0.000 0.000 0.000 0.000 <frozen abc>:121(__subclasscheck__)
1 0.000 0.000 0.000 0.000 {method 'values' of 'mappingproxy' objects}
1 0.000 0.000 0.000 0.000 _base.py:337(_invoke_callbacks)
1 0.000 0.000 0.000 0.000 ioloop.py:742(_run_callback)
3 0.000 0.000 0.000 0.000 {method 'acquire' of '_thread.lock' objects}
1 0.000 0.000 0.000 0.000 queues.py:177(empty)
2 0.000 0.000 0.000 0.000 typing.py:1374(__instancecheck__)
2 0.000 0.000 0.000 0.000 traitlets.py:3486(validate_elements)
5 0.000 0.000 0.000 0.000 zmqstream.py:538(receiving)
1 0.000 0.000 0.000 0.000 threading.py:631(clear)
3 0.000 0.000 0.000 0.000 {method 'items' of 'mappingproxy' objects}
1 0.000 0.000 0.000 0.000 selector_events.py:141(_write_to_self)
1 0.000 0.000 0.000 0.000 unix_events.py:83(_process_self_data)
1 0.000 0.000 0.000 0.000 threading.py:398(notify)
1 0.000 0.000 0.000 0.000 {method '__enter__' of '_thread.RLock' objects}
4 0.000 0.000 0.000 0.000 {built-in method time.monotonic}
4 0.000 0.000 0.000 0.000 {built-in method builtins.hash}
2 0.000 0.000 0.000 0.000 {built-in method builtins.issubclass}
3 0.000 0.000 0.000 0.000 {built-in method builtins.max}
1 0.000 0.000 0.000 0.000 {built-in method _thread.allocate_lock}
2 0.000 0.000 0.000 0.000 {built-in method _contextvars.copy_context}
4 0.000 0.000 0.000 0.000 inspect.py:3070(parameters)
2 0.000 0.000 0.000 0.000 {built-in method builtins.iter}
1 0.000 0.000 0.000 0.000 base_events.py:1932(_add_callback)
2 0.000 0.000 0.000 0.000 iostream.py:255(closed)
2 0.000 0.000 0.000 0.000 {method 'extend' of 'list' objects}
2 0.000 0.000 0.000 0.000 {built-in method posix.getpid}
2 0.000 0.000 0.000 0.000 {method '__enter__' of '_thread.lock' objects}
2 0.000 0.000 0.000 0.000 {method 'cancelled' of '_asyncio.Future' objects}
1 0.000 0.000 0.000 0.000 {method 'done' of '_asyncio.Future' objects}
1 0.000 0.000 0.000 0.000 threading.py:318(_is_owned)
1 0.000 0.000 0.000 0.000 zmqstream.py:694(<lambda>)
1 0.000 0.000 0.000 0.000 queues.py:309(_get)
3 0.000 0.000 0.000 0.000 base_events.py:550(_check_closed)
5 0.000 0.000 0.000 0.000 base_events.py:2045(get_debug)
4 0.000 0.000 0.000 0.000 inspect.py:2775(name)
1 0.000 0.000 0.000 0.000 {built-in method _asyncio.get_running_loop}
1 0.000 0.000 0.000 0.000 inspect.py:2869(__init__)
1 0.000 0.000 0.000 0.000 queues.py:59(_set_timeout)
1 0.000 0.000 0.000 0.000 {method 'release' of '_thread.lock' objects}
1 0.000 0.000 0.000 0.000 {method '_is_owned' of '_thread.RLock' objects}
1 0.000 0.000 0.000 0.000 {method '__exit__' of '_thread.RLock' objects}
1 0.000 0.000 0.000 0.000 locks.py:224(clear)
1 0.000 0.000 0.000 0.000 base_events.py:754(is_closed)
In the notebook the output looks something like this:
14 function calls in 9.597 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
5 8.121 1.624 8.121 1.624 <ipython-input-15-f105717832a2>:4(<listcomp>)
5 0.747 0.149 0.747 0.149 {built-in method builtins.sum}
1 0.665 0.665 9.533 9.533 <ipython-input-15-f105717832a2>:1(sum_of_lists)
1 0.065 0.065 9.597 9.597 <string>:1(<module>)
1 0.000 0.000 9.597 9.597 {built-in method builtins.exec}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
The result is a table that shows the execution time for each function call, sorted by total time. In this case, most of the time is consumed with list comprehension within sum_of_lists. This gives us clues as to where we could improve the efficiency of the algorithm.
Profiling line by line: %lprun¶
Profiling with %prun is useful, but sometimes a line-by-line profile report is more insightful. This isn’t built into Python or IPython, but there is a package available, line_profiler, that enables this. This can be provided in your kernel with
$ spack env activate python-374
$ spack install py-line-profiler ^python@3.7.4%gcc@9.1.0
Alternatively, you can install line-profiler with other package managers, for example
$ uv add line_profiler
Now you can load IPython with the line_profiler extension:
[8]:
%load_ext line_profiler
The %lprun command profiles each function line by line. In this case, you must explicitly specify which functions are of interest for creating the profile:
[9]:
%lprun -f sum_of_lists sum_of_lists(5000)
The result looks something like this:
Timer unit: 1e-06 s
Total time: 0.015145 s
File: <ipython-input-6-f105717832a2>
Function: sum_of_lists at line 1
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1 def sum_of_lists(N):
2 1 1.0 1.0 0.0 total = 0
3 6 11.0 1.8 0.1 for i in range(5):
4 5 14804.0 2960.8 97.7 L = [j ^ (j >> i) for j in range(N)]
5 5 329.0 65.8 2.2 total += sum(L)
6 1 0.0 0.0 0.0 return total
The time is given in microseconds and we can see which line the function spends most of its time on. We may then be able to modify the script in such a way that the efficiency of the function can be increased.
More information about %lprun and the available options can be found in the IPython help function %lprun?.
Create a storage profile: %memit and %mprun¶
Another aspect of profiling is the amount of memory that an operation uses. This can be evaluated with another IPython extension, the memory_profiler. This can be provided in your kernel with
$ spack env activate python-311
$ spack install py-memory-profiler
Alternatively you can install memory-profiler with other package managers, for example
$ uv add memory_profiler
[10]:
%load_ext memory_profiler
The memory_profiler extension is already loaded. To reload it, use:
%reload_ext memory_profiler
[11]:
%memit sum_of_lists(1000000)
peak memory: 205.31 MiB, increment: 70.73 MiB
We see that this feature occupies approximately 100 MB of memory.
For a line-by-line description of memory usage, we can use the %mprun magic. Unfortunately, this magic only works for functions that are defined in separate modules and not for the notebook itself. So we first use the %%file magic to create a simple module called mprun_demo.py that contains our sum_of_lists function.
[12]:
%%file mprun_demo.py
from memory_profiler import profile
@profile
def my_func():
a = [1] * (10 ** 6)
b = [2] * (2 * 10 ** 7)
del b
return a
Overwriting mprun_demo.py
[13]:
from mprun_demo import my_func
%mprun -f my_func my_func()
Filename: /Users/veit/cusy/trn/Python4DataScience/docs/performance/mprun_demo.py
Line # Mem usage Increment Occurrences Line Contents
=============================================================
3 150.2 MiB 150.2 MiB 1 @profile
4 def my_func():
5 157.9 MiB 7.6 MiB 1 a = [1] * (10 ** 6)
6 310.5 MiB 152.6 MiB 1 b = [2] * (2 * 10 ** 7)
7 310.5 MiB 0.0 MiB 1 del b
8 310.5 MiB 0.0 MiB 1 return a
Filename: /Users/veit/sandbox/py313/.venv/lib/python3.13/site-packages/memory_profiler.py
Line # Mem usage Increment Occurrences Line Contents
=============================================================
1185 150.2 MiB 150.2 MiB 1 @wraps(wrapped=func)
1186 def wrapper(*args, **kwargs):
1187 150.2 MiB 0.0 MiB 1 prof = get_prof()
1188 310.5 MiB 160.2 MiB 1 val = prof(func)(*args, **kwargs)
1189 310.5 MiB 0.0 MiB 1 show_results_bound(prof)
1190 310.5 MiB 0.0 MiB 1 return val
Here the Increment column shows how much each row affects the total memory consumption: Note that when we calculate b we need about 160 MB of additional memory; however, this is not released again by deleting b.
More information about %memit, %mprun and their options can be found in the IPython help with %memit?.
pyheatmagic¶
pyheatmagic is an extension that allows the IPython magic command %%heat to display Python code as a heatmap with Py-Heat.
It can be easily installed in the kernel with
$ uv add py-heat-magic Installing py-heat-magic… …
Loading the extension in IPython¶
[14]:
%load_ext heat
Display the heat map¶
[15]:
%%heat
def powfun(a, b):
"""Method to raise a to power b using pow() function."""
return pow(a, b)
def powop(a, b):
"""Method to raise a to power b using ** operator."""
return a**b
def powmodexp(a, b):
"""Method to raise a to power b using modular exponentiation."""
base = a
res = 1
while b > 0:
if b & 1:
res *= base
base *= base
b >>= 1
return res
def main():
"""Test function."""
a, b = 2377757, 773
pow_function = powfun(a, b)
pow_operator = powop(a, b)
pow_modular_exponentiation = powmodexp(a, b)
if __name__ == "__main__":
main()
Alternatively, the heatmap can also be saved as a file, for example with
%%heat -o pow-heatmap.png