{ "cells": [ { "cell_type": "markdown", "id": "9bdc9aaa", "metadata": {}, "source": [ "# Indexing and slicing\n", "\n", "Indexing is the selection of a subset of your data or individual elements. This is very easy in one-dimensional arrays; they behave similarly to Python lists:" ] }, { "cell_type": "code", "execution_count": 1, "id": "30333f50", "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": 2, "id": "7b193366", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0.84744338, 0.72444063, -0.1683032 ],\n", " [-0.35206446, 1.2773733 , 0.42331498],\n", " [-0.63538581, 0.28763612, 1.70806247],\n", " [ 0.5533326 , 1.22789904, -0.12317261],\n", " [-0.09251626, -0.00445541, -0.28129234],\n", " [-0.05594547, -0.7873575 , 2.18726381],\n", " [-1.03566553, -0.15670435, -0.58689331],\n", " [-0.00324649, -1.54265608, 2.0516329 ],\n", " [ 1.89165757, -0.80526861, 0.17857485],\n", " [-1.04799141, -0.72118009, 0.26854029]])" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rng = np.random.default_rng()\n", "data = rng.normal(size=(10, 3))\n", "data" ] }, { "cell_type": "code", "execution_count": 3, "id": "6cd243fc", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([-0.09251626, -0.00445541, -0.28129234])" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data[4]" ] }, { "cell_type": "code", "execution_count": 4, "id": "f333c3fe", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-0.63538581, 0.28763612, 1.70806247],\n", " [ 0.5533326 , 1.22789904, -0.12317261]])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data[2:4]" ] }, { "cell_type": "code", "execution_count": 5, "id": "81ffc2e6", "metadata": {}, "outputs": [], "source": [ "data[2:4] = rng.normal(size=(2, 3))" ] }, { "cell_type": "code", "execution_count": 6, "id": "e002d2a0", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0.84744338, 0.72444063, -0.1683032 ],\n", " [-0.35206446, 1.2773733 , 0.42331498],\n", " [-0.98336479, -0.51449641, 1.50154858],\n", " [-0.0927469 , 0.71687669, 1.15388018],\n", " [-0.09251626, -0.00445541, -0.28129234],\n", " [-0.05594547, -0.7873575 , 2.18726381],\n", " [-1.03566553, -0.15670435, -0.58689331],\n", " [-0.00324649, -1.54265608, 2.0516329 ],\n", " [ 1.89165757, -0.80526861, 0.17857485],\n", " [-1.04799141, -0.72118009, 0.26854029]])" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data" ] }, { "cell_type": "markdown", "id": "84bfdb46", "metadata": {}, "source": [ "
\n", "\n", "**Note:**\n", "\n", "Array slices differ from Python lists in that they are views of the original array. This means that the data is not copied and that any changes to the view are reflected in the original array.\n", "\n", "If you want to make a copy of a part of an `ndarray`, you can copy the array explicitly – for example with `data[2:5].copy()`.\n", "
" ] }, { "cell_type": "markdown", "id": "d869f82e", "metadata": {}, "source": [ "_Slicing_ in this way always results in array views with the same number of dimensions. However, if you mix integer indices and slices, you get slices with lower dimensions. For example, we can select the second row but only the first two columns as follows:" ] }, { "cell_type": "code", "execution_count": 7, "id": "6cf0f124", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([-0.35206446, 1.2773733 ])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data[1, :2]" ] }, { "cell_type": "markdown", "id": "3c8bcd97", "metadata": {}, "source": [ "A colon means that the whole axis is taken, so you can also select higher dimensional axes:" ] }, { "cell_type": "code", "execution_count": 8, "id": "470b42cc", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0.84744338],\n", " [-0.35206446],\n", " [-0.98336479],\n", " [-0.0927469 ],\n", " [-0.09251626],\n", " [-0.05594547],\n", " [-1.03566553],\n", " [-0.00324649],\n", " [ 1.89165757],\n", " [-1.04799141]])" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data[:, :1]" ] }, { "cell_type": "markdown", "id": "c84af141", "metadata": {}, "source": [ "## Boolean indexing\n", "\n", "Let’s consider an example where we have some data in an array and an array of names with duplicates. I will use the `normal` function in `numpy.random.default_rng` here to generate some random normally distributed data:" ] }, { "cell_type": "code", "execution_count": 9, "id": "7b9528b7", "metadata": {}, "outputs": [], "source": [ "names = np.array(\n", " [\n", " \"Liam\",\n", " \"Olivia\",\n", " \"Noah\",\n", " \"Liam\",\n", " \"Noah\",\n", " \"Olivia\",\n", " \"Liam\",\n", " \"Emma\",\n", " \"Oliver\",\n", " \"Ava\",\n", " ]\n", ")" ] }, { "cell_type": "code", "execution_count": 10, "id": "f25fcad6", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['Liam', 'Olivia', 'Noah', 'Liam', 'Noah', 'Olivia', 'Liam', 'Emma',\n", " 'Oliver', 'Ava'], dtype='\n", "\n", "**Note:**\n", "\n", "Selecting data from an array by Boolean indexing and assigning the result to a new variable always creates a copy of the data, even if the returned array is unchanged.\n", "" ] }, { "cell_type": "markdown", "id": "2e1ed0ee", "metadata": {}, "source": [ "In the following example, I select the rows where `names == 'Liam'` and also index the columns:" ] }, { "cell_type": "code", "execution_count": 14, "id": "80a62412", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-0.1683032 ],\n", " [ 1.15388018],\n", " [-0.58689331]])" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data[names == \"Liam\", 2:]" ] }, { "cell_type": "markdown", "id": "9e8b26ad", "metadata": {}, "source": [ "To select everything except _Liam_, you can either use `!=` or negate the condition with `~`:" ] }, { "cell_type": "code", "execution_count": 15, "id": "256604ba", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([False, True, True, False, True, True, False, True, True,\n", " True])" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "names != \"Liam\"" ] }, { "cell_type": "code", "execution_count": 16, "id": "8326e67d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-0.35206446, 1.2773733 , 0.42331498],\n", " [-0.98336479, -0.51449641, 1.50154858],\n", " [-0.09251626, -0.00445541, -0.28129234],\n", " [-0.05594547, -0.7873575 , 2.18726381],\n", " [-0.00324649, -1.54265608, 2.0516329 ],\n", " [ 1.89165757, -0.80526861, 0.17857485],\n", " [-1.04799141, -0.72118009, 0.26854029]])" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cond = names == \"Liam\"\n", "data[~cond]" ] }, { "cell_type": "markdown", "id": "862f71e6", "metadata": {}, "source": [ "If you select two of the three names to combine several Boolean conditions, you can use the Boolean arithmetic operators `&` (and) and `|` (or).\n", "\n", "
\n", "\n", "**Warning:**\n", "\n", "The Python keywords `and` and `or` do not work with Boolean arrays.\n", "
" ] }, { "cell_type": "code", "execution_count": 17, "id": "eaaa5c90", "metadata": {}, "outputs": [], "source": [ "mask = (names == \"Liam\") | (names == \"Olivia\")" ] }, { "cell_type": "code", "execution_count": 18, "id": "4a00289c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ True, True, False, True, False, True, True, False, False,\n", " False])" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mask" ] }, { "cell_type": "code", "execution_count": 19, "id": "c2812827", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0.84744338, 0.72444063, -0.1683032 ],\n", " [-0.35206446, 1.2773733 , 0.42331498],\n", " [-0.0927469 , 0.71687669, 1.15388018],\n", " [-0.05594547, -0.7873575 , 2.18726381],\n", " [-1.03566553, -0.15670435, -0.58689331]])" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data[mask]" ] }, { "cell_type": "markdown", "id": "0ef20925", "metadata": {}, "source": [ "## Integer Array Indexing\n", "\n", "Integer array indexing allows you to select any elements in the array based on your N-dimensional index. Each integer array represents a number of indices in that dimension.\n", "\n", "
\n", "\n", "**See also:**\n", "\n", "* [Integer array indexing](https://numpy.org/doc/stable/user/basics.indexing.html#integer-array-indexing)\n", "
" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.13 Kernel", "language": "python", "name": "python313" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.0" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }