Conditional logic as array operations – where

The numpy.where function is a vectorised version of if and else.

In the following example, we first create a Boolean array and two arrays with values:

[1]:
import numpy as np
[2]:
cond = ([False,  True, False,  True, False, False, False])
data1 = np.random.randn(1, 7)
data2 = np.random.randn(1, 7)

Now we want to take the values from data1 if the corresponding value in cond is True and otherwise take the value from data2. With Python’s if-else, this could look like this:

[3]:
result = [(x if c else y) for x, y, c in zip(data1, data2, cond)]

result
[3]:
[array([-1.44855826,  1.36998598, -0.20317678,  1.34608124,  0.40381171,
        -0.53214436, -0.39467458])]

However, this has the following two problems:

  • with large arrays the function will not be very fast

  • this will not work with multidimensional arrays

With np.where you can work around these problems in a single function call:

[4]:
result = np.where(cond, data1, data2)

result
[4]:
array([[-1.44855826,  0.52391667, -0.20317678,  0.23328353,  0.40381171,
        -0.53214436, -0.39467458]])

The second and third arguments of np.where do not have to be arrays; one or both can also be scalars. A typical use of where in data analysis is to create a new array of values based on another array. Suppose you have a matrix of randomly generated data and you want to make all the negative values positive values:

[5]:
data = np.random.randn(4, 4)

data
[5]:
array([[ 0.09739726,  1.0954641 ,  1.21257909, -0.06470122],
       [ 0.65963544,  1.23582335,  0.47142984,  1.10924854],
       [-0.11219385, -0.59830829,  0.1750536 ,  1.22600517],
       [ 0.97477413, -0.5904872 ,  0.26752476,  0.19260319]])
[6]:
data < 0
[6]:
array([[False, False, False,  True],
       [False, False, False, False],
       [ True,  True, False, False],
       [False,  True, False, False]])
[7]:
np.where(data < 0, data * -1, data)
[7]:
array([[0.09739726, 1.0954641 , 1.21257909, 0.06470122],
       [0.65963544, 1.23582335, 0.47142984, 1.10924854],
       [0.11219385, 0.59830829, 0.1750536 , 1.22600517],
       [0.97477413, 0.5904872 , 0.26752476, 0.19260319]])