Conditional logic as array operations – where#

The numpy.where function is a vectorised version of if and else.

In the following example, we first create a Boolean array and two arrays with values:

[1]:
import numpy as np
[2]:
cond = ([False,  True, False,  True, False, False, False])
data1 = np.random.randn(1, 7)
data2 = np.random.randn(1, 7)

Now we want to take the values from data1 if the corresponding value in cond is True and otherwise take the value from data2. With Python’s if-else, this could look like this:

[3]:
result = [(x if c else y) for x, y, c in zip(data1, data2, cond)]

result
[3]:
[array([ 0.0753595 ,  0.70598847,  1.36375888,  0.52613878,  1.58394917,
        -0.67041886, -1.30890145])]

However, this has the following two problems:

  • with large arrays the function will not be very fast

  • this will not work with multidimensional arrays

With np.where you can work around these problems in a single function call:

[4]:
result = np.where(cond, data1, data2)

result
[4]:
array([[ 0.0753595 , -0.97727968,  1.36375888,  1.5042741 ,  1.58394917,
        -0.67041886, -1.30890145]])

The second and third arguments of np.where do not have to be arrays; one or both can also be scalars. A typical use of where in data analysis is to create a new array of values based on another array. Suppose you have a matrix of randomly generated data and you want to make all the negative values positive values:

[5]:
data = np.random.randn(4, 4)

data
[5]:
array([[-2.13569944,  0.21406577, -0.44948598,  0.07841356],
       [ 0.94045485, -0.47748714, -0.70057099, -1.92553004],
       [-1.65814642,  0.44475682, -1.16289192,  0.96023582],
       [ 0.45396769,  0.64944133, -0.08936879, -1.20179191]])
[6]:
data < 0
[6]:
array([[ True, False,  True, False],
       [False,  True,  True,  True],
       [ True, False,  True, False],
       [False, False,  True,  True]])
[7]:
np.where(data < 0, data * -1, data)
[7]:
array([[2.13569944, 0.21406577, 0.44948598, 0.07841356],
       [0.94045485, 0.47748714, 0.70057099, 1.92553004],
       [1.65814642, 0.44475682, 1.16289192, 0.96023582],
       [0.45396769, 0.64944133, 0.08936879, 1.20179191]])