Indexing and slicing

Indexing is the selection of a subset of your data or individual elements. This is very easy in one-dimensional arrays; they behave similarly to Python lists:

[1]:
import numpy as np
[2]:
rng = np.random.default_rng()
data = rng.normal(size=(10, 3))
data
[2]:
array([[-0.1781624 , -0.8381147 ,  1.40248986],
       [-1.48367758,  0.70035394,  0.60506565],
       [ 2.24316514,  0.38021158,  0.95148769],
       [-0.37414371,  1.03258406, -1.51360252],
       [-1.6251526 ,  0.34516475,  0.6205052 ],
       [ 0.96867556,  0.13047506, -1.80399701],
       [-0.20605706, -1.04783043,  0.69553167],
       [ 1.14186171, -1.01894781, -1.44487713],
       [ 0.29214215,  1.60380789, -1.82980606],
       [-1.87650688, -0.5427789 ,  1.6327612 ]])
[3]:
data[4]
[3]:
array([-1.6251526 ,  0.34516475,  0.6205052 ])
[4]:
data[2:4]
[4]:
array([[ 2.24316514,  0.38021158,  0.95148769],
       [-0.37414371,  1.03258406, -1.51360252]])
[5]:
data[2:4] = rng.normal(size=(2, 3))
[6]:
data
[6]:
array([[-0.1781624 , -0.8381147 ,  1.40248986],
       [-1.48367758,  0.70035394,  0.60506565],
       [-0.07210875, -0.4775101 , -1.09241001],
       [ 2.45845089, -0.26972796, -2.0442523 ],
       [-1.6251526 ,  0.34516475,  0.6205052 ],
       [ 0.96867556,  0.13047506, -1.80399701],
       [-0.20605706, -1.04783043,  0.69553167],
       [ 1.14186171, -1.01894781, -1.44487713],
       [ 0.29214215,  1.60380789, -1.82980606],
       [-1.87650688, -0.5427789 ,  1.6327612 ]])

Note:

Array slices differ from Python lists in that they are views of the original array. This means that the data is not copied and that any changes to the view are reflected in the original array.

If you want to make a copy of a part of an ndarray, you can copy the array explicitly – for example with data[2:5].copy().

Slicing in this way always results in array views with the same number of dimensions. However, if you mix integer indices and slices, you get slices with lower dimensions. For example, we can select the second row but only the first two columns as follows:

[7]:
data[1, :2]
[7]:
array([-1.48367758,  0.70035394])

A colon means that the whole axis is taken, so you can also select higher dimensional axes:

[8]:
data[:, :1]
[8]:
array([[-0.1781624 ],
       [-1.48367758],
       [-0.07210875],
       [ 2.45845089],
       [-1.6251526 ],
       [ 0.96867556],
       [-0.20605706],
       [ 1.14186171],
       [ 0.29214215],
       [-1.87650688]])

Boolean indexing

Let’s consider an example where we have some data in an array and an array of names with duplicates. I will use the normal function in numpy.random.default_rng here to generate some random normally distributed data:

[9]:
names = np.array(
    [
        "Liam",
        "Olivia",
        "Noah",
        "Liam",
        "Noah",
        "Olivia",
        "Liam",
        "Emma",
        "Oliver",
        "Ava",
    ]
)
[10]:
names
[10]:
array(['Liam', 'Olivia', 'Noah', 'Liam', 'Noah', 'Olivia', 'Liam', 'Emma',
       'Oliver', 'Ava'], dtype='<U6')
[11]:
data
[11]:
array([[-0.1781624 , -0.8381147 ,  1.40248986],
       [-1.48367758,  0.70035394,  0.60506565],
       [-0.07210875, -0.4775101 , -1.09241001],
       [ 2.45845089, -0.26972796, -2.0442523 ],
       [-1.6251526 ,  0.34516475,  0.6205052 ],
       [ 0.96867556,  0.13047506, -1.80399701],
       [-0.20605706, -1.04783043,  0.69553167],
       [ 1.14186171, -1.01894781, -1.44487713],
       [ 0.29214215,  1.60380789, -1.82980606],
       [-1.87650688, -0.5427789 ,  1.6327612 ]])

Suppose each name corresponds to a row in the data array and we want to select all rows with the corresponding name Liam. Like arithmetic operations, comparisons like == are vectorised with arrays. So comparing names with the string Liam results in a Boolean array:

[12]:
names == "Liam"
[12]:
array([ True, False, False,  True, False, False,  True, False, False,
       False])

This Boolean array can be passed when indexing the array:

[13]:
data[names == "Liam"]
[13]:
array([[-0.1781624 , -0.8381147 ,  1.40248986],
       [ 2.45845089, -0.26972796, -2.0442523 ],
       [-0.20605706, -1.04783043,  0.69553167]])

Here, the Boolean array must have the same length as the array axis it indexes.

Note:

Selecting data from an array by Boolean indexing and assigning the result to a new variable always creates a copy of the data, even if the returned array is unchanged.

In the following example, I select the rows where names == 'Liam' and also index the columns:

[14]:
data[names == "Liam", 2:]
[14]:
array([[ 1.40248986],
       [-2.0442523 ],
       [ 0.69553167]])

To select everything except Liam, you can either use != or negate the condition with ~:

[15]:
names != "Liam"
[15]:
array([False,  True,  True, False,  True,  True, False,  True,  True,
        True])
[16]:
cond = names == "Liam"
data[~cond]
[16]:
array([[-1.48367758,  0.70035394,  0.60506565],
       [-0.07210875, -0.4775101 , -1.09241001],
       [-1.6251526 ,  0.34516475,  0.6205052 ],
       [ 0.96867556,  0.13047506, -1.80399701],
       [ 1.14186171, -1.01894781, -1.44487713],
       [ 0.29214215,  1.60380789, -1.82980606],
       [-1.87650688, -0.5427789 ,  1.6327612 ]])

If you select two of the three names to combine several Boolean conditions, you can use the Boolean arithmetic operators & (and) and | (or).

Warning:

The Python keywords and and or do not work with Boolean arrays.

[17]:
mask = (names == "Liam") | (names == "Olivia")
[18]:
mask
[18]:
array([ True,  True, False,  True, False,  True,  True, False, False,
       False])
[19]:
data[mask]
[19]:
array([[-0.1781624 , -0.8381147 ,  1.40248986],
       [-1.48367758,  0.70035394,  0.60506565],
       [ 2.45845089, -0.26972796, -2.0442523 ],
       [ 0.96867556,  0.13047506, -1.80399701],
       [-0.20605706, -1.04783043,  0.69553167]])

Integer Array Indexing

Integer array indexing allows you to select any elements in the array based on your N-dimensional index. Each integer array represents a number of indices in that dimension.