Array-oriented programming – vectorisation¶

Using NumPy arrays allows you to express many types of data processing tasks as concise array expressions that would otherwise require writing for-loops. This practice of replacing loops with array expressions is also called vectorisation. In general, vectorised array operations are significantly faster than their pure Python equivalents.

[1]:

import numpy as np

First we create a NumPy array with one hundred thousand integers:

[2]:

myarray = np.arange(100000)

Then we square all the elements in this array with numpy.square:

[3]:

%time np.square(myarray)

CPU times: user 185 μs, sys: 994 μs, total: 1.18 ms
Wall time: 107 μs

[3]:

array([         0,          1,          4, ..., 9999400009, 9999600004,
       9999800001])

For comparison, we now measure the time of Python’s quadratic function:

[4]:

%time for _ in range(10): myarray2 = myarray ** 2

CPU times: user 2.17 ms, sys: 7.11 ms, total: 9.28 ms
Wall time: 1.22 ms

And finally, we compare the time with the calculation of the quadratic function of all values of a Python list:

[5]:

mylist = list(range(100000))
%time for _ in range(10): mylist2 = [x ** 2 for x in mylist]

CPU times: user 116 ms, sys: 365 ms, total: 480 ms
Wall time: 47.5 ms