{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "7543a862",
   "metadata": {},
   "source": [
    "# Select and filter data\n",
    "\n",
    "Indexing series `(obj[...])` works analogously to indexing NumPy arrays, except that you can use index values of the series instead of just integers. Here are some examples:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "5359176d",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "339875bb",
   "metadata": {},
   "outputs": [],
   "source": [
    "idx = pd.date_range(\"2022-02-02\", periods=7)\n",
    "rng = np.random.default_rng()\n",
    "s = pd.Series(rng.normal(size=7), index=idx)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "ac21c059",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2022-02-02    0.852492\n",
       "2022-02-03    2.073257\n",
       "2022-02-04   -0.391827\n",
       "2022-02-05   -1.288372\n",
       "2022-02-06    0.238623\n",
       "2022-02-07    1.752573\n",
       "2022-02-08   -0.155582\n",
       "Freq: D, dtype: float64"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "642b2864",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "np.float64(2.073256834089322)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s[\"2022-02-03\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "e1b90b3e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "np.float64(2.073256834089322)"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.iloc[1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "ccc10714",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2022-02-04   -0.391827\n",
       "2022-02-05   -1.288372\n",
       "Freq: D, dtype: float64"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    " s[2:4]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "0d737dcc",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2022-02-04   -0.391827\n",
       "2022-02-03    2.073257\n",
       "2022-02-02    0.852492\n",
       "dtype: float64"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s[[\"2022-02-04\", \"2022-02-03\", \"2022-02-02\"]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "003e3a80",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2022-02-03    2.073257\n",
       "2022-02-05   -1.288372\n",
       "Freq: 2D, dtype: float64"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.iloc[[1, 3]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "83376cb4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2022-02-02    0.852492\n",
       "2022-02-03    2.073257\n",
       "2022-02-06    0.238623\n",
       "2022-02-07    1.752573\n",
       "dtype: float64"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s[s > 0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "295647c8",
   "metadata": {},
   "source": [
    "While you can select data by label in this way, the preferred method for selecting index values is the `loc` operator:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "be9a6c3a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2022-02-04   -0.391827\n",
       "2022-02-03    2.073257\n",
       "2022-02-02    0.852492\n",
       "dtype: float64"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.loc[[\"2022-02-04\", \"2022-02-03\", \"2022-02-02\"]]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "50cc31af",
   "metadata": {},
   "source": [
    "The reason for the preference for `loc` is the different treatment of integers when indexing with `[]`. In regular `[]`-based indexing, integers are treated as labels if the index contains integers, so the behaviour varies depending on the data type of the index. In our example, the expression `s.loc[[3, 2, 1]]` will fail because the index does not contain integers:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "94057262",
   "metadata": {},
   "outputs": [
    {
     "ename": "KeyError",
     "evalue": "\"None of [Index([3, 2, 1], dtype='int64')] are in the [index]\"",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mKeyError\u001b[0m                                  Traceback (most recent call last)",
      "Cell \u001b[0;32mIn[11], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43ms\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mloc\u001b[49m\u001b[43m[\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m3\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m2\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m]\u001b[49m\n",
      "File \u001b[0;32m~/sandbox/py313/.venv/lib/python3.13/site-packages/pandas/core/indexing.py:1191\u001b[0m, in \u001b[0;36m_LocationIndexer.__getitem__\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m   1189\u001b[0m maybe_callable \u001b[38;5;241m=\u001b[39m com\u001b[38;5;241m.\u001b[39mapply_if_callable(key, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mobj)\n\u001b[1;32m   1190\u001b[0m maybe_callable \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_check_deprecated_callable_usage(key, maybe_callable)\n\u001b[0;32m-> 1191\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_getitem_axis\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmaybe_callable\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43maxis\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43maxis\u001b[49m\u001b[43m)\u001b[49m\n",
      "File \u001b[0;32m~/sandbox/py313/.venv/lib/python3.13/site-packages/pandas/core/indexing.py:1420\u001b[0m, in \u001b[0;36m_LocIndexer._getitem_axis\u001b[0;34m(self, key, axis)\u001b[0m\n\u001b[1;32m   1417\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mhasattr\u001b[39m(key, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mndim\u001b[39m\u001b[38;5;124m\"\u001b[39m) \u001b[38;5;129;01mand\u001b[39;00m key\u001b[38;5;241m.\u001b[39mndim \u001b[38;5;241m>\u001b[39m \u001b[38;5;241m1\u001b[39m:\n\u001b[1;32m   1418\u001b[0m         \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mCannot index with multidimensional key\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m-> 1420\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_getitem_iterable\u001b[49m\u001b[43m(\u001b[49m\u001b[43mkey\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43maxis\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43maxis\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m   1422\u001b[0m \u001b[38;5;66;03m# nested tuple slicing\u001b[39;00m\n\u001b[1;32m   1423\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m is_nested_tuple(key, labels):\n",
      "File \u001b[0;32m~/sandbox/py313/.venv/lib/python3.13/site-packages/pandas/core/indexing.py:1360\u001b[0m, in \u001b[0;36m_LocIndexer._getitem_iterable\u001b[0;34m(self, key, axis)\u001b[0m\n\u001b[1;32m   1357\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_validate_key(key, axis)\n\u001b[1;32m   1359\u001b[0m \u001b[38;5;66;03m# A collection of keys\u001b[39;00m\n\u001b[0;32m-> 1360\u001b[0m keyarr, indexer \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_get_listlike_indexer\u001b[49m\u001b[43m(\u001b[49m\u001b[43mkey\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43maxis\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m   1361\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mobj\u001b[38;5;241m.\u001b[39m_reindex_with_indexers(\n\u001b[1;32m   1362\u001b[0m     {axis: [keyarr, indexer]}, copy\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m, allow_dups\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m\n\u001b[1;32m   1363\u001b[0m )\n",
      "File \u001b[0;32m~/sandbox/py313/.venv/lib/python3.13/site-packages/pandas/core/indexing.py:1558\u001b[0m, in \u001b[0;36m_LocIndexer._get_listlike_indexer\u001b[0;34m(self, key, axis)\u001b[0m\n\u001b[1;32m   1555\u001b[0m ax \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mobj\u001b[38;5;241m.\u001b[39m_get_axis(axis)\n\u001b[1;32m   1556\u001b[0m axis_name \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mobj\u001b[38;5;241m.\u001b[39m_get_axis_name(axis)\n\u001b[0;32m-> 1558\u001b[0m keyarr, indexer \u001b[38;5;241m=\u001b[39m \u001b[43max\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_get_indexer_strict\u001b[49m\u001b[43m(\u001b[49m\u001b[43mkey\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43maxis_name\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m   1560\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m keyarr, indexer\n",
      "File \u001b[0;32m~/sandbox/py313/.venv/lib/python3.13/site-packages/pandas/core/indexes/base.py:6200\u001b[0m, in \u001b[0;36mIndex._get_indexer_strict\u001b[0;34m(self, key, axis_name)\u001b[0m\n\u001b[1;32m   6197\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m   6198\u001b[0m     keyarr, indexer, new_indexer \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_reindex_non_unique(keyarr)\n\u001b[0;32m-> 6200\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_raise_if_missing\u001b[49m\u001b[43m(\u001b[49m\u001b[43mkeyarr\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mindexer\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43maxis_name\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m   6202\u001b[0m keyarr \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mtake(indexer)\n\u001b[1;32m   6203\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(key, Index):\n\u001b[1;32m   6204\u001b[0m     \u001b[38;5;66;03m# GH 42790 - Preserve name from an Index\u001b[39;00m\n",
      "File \u001b[0;32m~/sandbox/py313/.venv/lib/python3.13/site-packages/pandas/core/indexes/base.py:6249\u001b[0m, in \u001b[0;36mIndex._raise_if_missing\u001b[0;34m(self, key, indexer, axis_name)\u001b[0m\n\u001b[1;32m   6247\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m nmissing:\n\u001b[1;32m   6248\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m nmissing \u001b[38;5;241m==\u001b[39m \u001b[38;5;28mlen\u001b[39m(indexer):\n\u001b[0;32m-> 6249\u001b[0m         \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mKeyError\u001b[39;00m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mNone of [\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mkey\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m] are in the [\u001b[39m\u001b[38;5;132;01m{\u001b[39;00maxis_name\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m]\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m   6251\u001b[0m     not_found \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mlist\u001b[39m(ensure_index(key)[missing_mask\u001b[38;5;241m.\u001b[39mnonzero()[\u001b[38;5;241m0\u001b[39m]]\u001b[38;5;241m.\u001b[39munique())\n\u001b[1;32m   6252\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mKeyError\u001b[39;00m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mnot_found\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m not in index\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n",
      "\u001b[0;31mKeyError\u001b[0m: \"None of [Index([3, 2, 1], dtype='int64')] are in the [index]\""
     ]
    }
   ],
   "source": [
    "s.loc[[3, 2, 1]]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "390c0838",
   "metadata": {},
   "source": [
    "While the `loc` operator exclusively indexes labels, the `iloc` operator exclusively indexes with integers:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "263b4ddf",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2022-02-05   -1.288372\n",
       "2022-02-04   -0.391827\n",
       "2022-02-03    2.073257\n",
       "Freq: -1D, dtype: float64"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.iloc[[3, 2, 1]]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dbe97dd5",
   "metadata": {},
   "source": [
    "You can also slice with labels, but this works differently from normal Python slicing because the endpoint is included:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "3d21b345",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2022-02-03    2.073257\n",
       "2022-02-04   -0.391827\n",
       "Freq: D, dtype: float64"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.loc[\"2022-02-03\":\"2022-02-04\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "99deca60",
   "metadata": {},
   "source": [
    "Setting with these methods changes the corresponding section of the row:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "18c03d20",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2022-02-02    0.852492\n",
       "2022-02-03    0.000000\n",
       "2022-02-04    0.000000\n",
       "2022-02-05   -1.288372\n",
       "2022-02-06    0.238623\n",
       "2022-02-07    1.752573\n",
       "2022-02-08   -0.155582\n",
       "Freq: D, dtype: float64"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.loc[\"2022-02-03\":\"2022-02-04\"] = 0\n",
    "\n",
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "09a14ccb",
   "metadata": {},
   "source": [
    "Indexing in a DataFrame is used to retrieve one or more columns with either a single value or a sequence:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "3f40a46f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Decimal</th>\n",
       "      <th>Octal</th>\n",
       "      <th>Key</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Code</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>U+0000</th>\n",
       "      <td>0</td>\n",
       "      <td>001</td>\n",
       "      <td>NUL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>U+0001</th>\n",
       "      <td>1</td>\n",
       "      <td>002</td>\n",
       "      <td>Ctrl-A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>U+0002</th>\n",
       "      <td>2</td>\n",
       "      <td>003</td>\n",
       "      <td>Ctrl-B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>U+0003</th>\n",
       "      <td>3</td>\n",
       "      <td>004</td>\n",
       "      <td>Ctrl-C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>U+0004</th>\n",
       "      <td>4</td>\n",
       "      <td>004</td>\n",
       "      <td>Ctrl-D</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>U+0005</th>\n",
       "      <td>5</td>\n",
       "      <td>005</td>\n",
       "      <td>Ctrl-E</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        Decimal Octal     Key\n",
       "Code                         \n",
       "U+0000        0   001     NUL\n",
       "U+0001        1   002  Ctrl-A\n",
       "U+0002        2   003  Ctrl-B\n",
       "U+0003        3   004  Ctrl-C\n",
       "U+0004        4   004  Ctrl-D\n",
       "U+0005        5   005  Ctrl-E"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = {\n",
    "    \"Code\": [\"U+0000\", \"U+0001\", \"U+0002\", \"U+0003\", \"U+0004\", \"U+0005\"],\n",
    "    \"Decimal\": [0, 1, 2, 3, 4, 5],\n",
    "    \"Octal\": [\"001\", \"002\", \"003\", \"004\", \"004\", \"005\"],\n",
    "    \"Key\": [\"NUL\", \"Ctrl-A\", \"Ctrl-B\", \"Ctrl-C\", \"Ctrl-D\", \"Ctrl-E\"],\n",
    "}\n",
    "\n",
    "df = pd.DataFrame(data)\n",
    "df = pd.DataFrame(data, columns=[\"Decimal\", \"Octal\", \"Key\"], index=df[\"Code\"])\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "ff8881ff",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Code\n",
       "U+0000       NUL\n",
       "U+0001    Ctrl-A\n",
       "U+0002    Ctrl-B\n",
       "U+0003    Ctrl-C\n",
       "U+0004    Ctrl-D\n",
       "U+0005    Ctrl-E\n",
       "Name: Key, dtype: object"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"Key\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "6701d732",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Decimal</th>\n",
       "      <th>Key</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Code</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>U+0000</th>\n",
       "      <td>0</td>\n",
       "      <td>NUL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>U+0001</th>\n",
       "      <td>1</td>\n",
       "      <td>Ctrl-A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>U+0002</th>\n",
       "      <td>2</td>\n",
       "      <td>Ctrl-B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>U+0003</th>\n",
       "      <td>3</td>\n",
       "      <td>Ctrl-C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>U+0004</th>\n",
       "      <td>4</td>\n",
       "      <td>Ctrl-D</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>U+0005</th>\n",
       "      <td>5</td>\n",
       "      <td>Ctrl-E</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        Decimal     Key\n",
       "Code                   \n",
       "U+0000        0     NUL\n",
       "U+0001        1  Ctrl-A\n",
       "U+0002        2  Ctrl-B\n",
       "U+0003        3  Ctrl-C\n",
       "U+0004        4  Ctrl-D\n",
       "U+0005        5  Ctrl-E"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[[\"Decimal\", \"Key\"]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "e6af9bbe",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Decimal</th>\n",
       "      <th>Octal</th>\n",
       "      <th>Key</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Code</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>U+0000</th>\n",
       "      <td>0</td>\n",
       "      <td>001</td>\n",
       "      <td>NUL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>U+0001</th>\n",
       "      <td>1</td>\n",
       "      <td>002</td>\n",
       "      <td>Ctrl-A</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        Decimal Octal     Key\n",
       "Code                         \n",
       "U+0000        0   001     NUL\n",
       "U+0001        1   002  Ctrl-A"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[:2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "9d0deda5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Decimal</th>\n",
       "      <th>Octal</th>\n",
       "      <th>Key</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Code</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>U+0003</th>\n",
       "      <td>3</td>\n",
       "      <td>004</td>\n",
       "      <td>Ctrl-C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>U+0004</th>\n",
       "      <td>4</td>\n",
       "      <td>004</td>\n",
       "      <td>Ctrl-D</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>U+0005</th>\n",
       "      <td>5</td>\n",
       "      <td>005</td>\n",
       "      <td>Ctrl-E</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        Decimal Octal     Key\n",
       "Code                         \n",
       "U+0003        3   004  Ctrl-C\n",
       "U+0004        4   004  Ctrl-D\n",
       "U+0005        5   005  Ctrl-E"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df[\"Decimal\"] > 2]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cbd2174b",
   "metadata": {},
   "source": [
    "The line selection syntax `df[:2]` is provided for convenience. Passing a single item or a list to the `[]` operator selects columns."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0c4c64e0",
   "metadata": {},
   "source": [
    "Another use case is indexing with a Boolean DataFrame, which is generated by a scalar comparison, for example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "b839beb6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Code\n",
       "U+0000    False\n",
       "U+0001    False\n",
       "U+0002    False\n",
       "U+0003     True\n",
       "U+0004     True\n",
       "U+0005     True\n",
       "Name: Decimal, dtype: bool"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"Decimal\"] > 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "0b7eac34-07cd-4551-bcef-58e2a18e5112",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/var/folders/hk/s8m0bblj0g10hw885gld52mc0000gn/T/ipykernel_90739/3186310461.py:1: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'NA' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.\n",
      "  df[df[\"Decimal\"] > 2] = \"NA\"\n"
     ]
    }
   ],
   "source": [
    "df[df[\"Decimal\"] > 2] = \"NA\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f93d6130",
   "metadata": {},
   "source": [
    "Like Series, DataFrame has special operators `loc` and `iloc` for label-based and integer indexing respectively. Since DataFrame is two-dimensional, you can select a subset of the rows and columns with NumPy-like notation using either axis labels (`loc`) or integers (`iloc`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "435668de",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Decimal         2\n",
       "Key        Ctrl-B\n",
       "Name: U+0002, dtype: object"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[\"U+0002\", [\"Decimal\", \"Key\"]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "8483cc54",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Octal</th>\n",
       "      <th>Key</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Code</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>U+0002</th>\n",
       "      <td>003</td>\n",
       "      <td>Ctrl-B</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       Octal     Key\n",
       "Code                \n",
       "U+0002   003  Ctrl-B"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[[2], [1, 2]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "b4cf28cd",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Octal</th>\n",
       "      <th>Key</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Code</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>U+0000</th>\n",
       "      <td>001</td>\n",
       "      <td>NUL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>U+0001</th>\n",
       "      <td>002</td>\n",
       "      <td>Ctrl-A</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       Octal     Key\n",
       "Code                \n",
       "U+0000   001     NUL\n",
       "U+0001   002  Ctrl-A"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[[0, 1], [1, 2]]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5f407c49",
   "metadata": {},
   "source": [
    "Both indexing functions work with slices in addition to individual labels or lists of labels:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "90ca1775",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Code\n",
       "U+0000       NUL\n",
       "U+0001    Ctrl-A\n",
       "U+0002    Ctrl-B\n",
       "U+0003        NA\n",
       "Name: Key, dtype: object"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[:\"U+0003\", \"Key\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "d9e78d4a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Decimal</th>\n",
       "      <th>Octal</th>\n",
       "      <th>Key</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Code</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>U+0000</th>\n",
       "      <td>0</td>\n",
       "      <td>001</td>\n",
       "      <td>NUL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>U+0001</th>\n",
       "      <td>1</td>\n",
       "      <td>002</td>\n",
       "      <td>Ctrl-A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>U+0002</th>\n",
       "      <td>2</td>\n",
       "      <td>003</td>\n",
       "      <td>Ctrl-B</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       Decimal Octal     Key\n",
       "Code                        \n",
       "U+0000       0   001     NUL\n",
       "U+0001       1   002  Ctrl-A\n",
       "U+0002       2   003  Ctrl-B"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[:3, :3]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "934c211c",
   "metadata": {},
   "source": [
    "So there are many ways to select and rearrange the data contained in a pandas object. In the following, I put together a brief summary of most of these possibilities for DataFrames:\n",
    "\n",
    "Type | Note\n",
    ":--- | :---\n",
    "`df[LABEL]` | selects a single column or a sequence of columns from the DataFrame\n",
    "`df.loc[LABEL]` | selects a single row or a subset of rows from the DataFrame by label\n",
    "`df.loc[:, LABEL]` | selects a single column or a subset of columns from the DataFrame by Label\n",
    "`df.loc[LABEL1, LABEL2]` | selects both rows and columns by label\n",
    "`df.iloc[INTEGER]` | selects a single row or a subset of rows from the DataFrame by integer position\n",
    "`df.iloc[INTEGER1, INTEGER2]` | selects a single column or a subset of columns by integer position\n",
    "`df.at[LABEL1, LABEL2]` | selects a single value by row and column label\n",
    "`df.iat[INTEGER1, INTEGER2]` | selects a scalar value by row and column position (integers)\n",
    "`reindex NEW_INDEX` | selects rows or columns by label\n",
    "`get_value`, `set_value` | deprecated since version 0.21.0: use `.at[]` or `.iat[]` instead."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.13 Kernel",
   "language": "python",
   "name": "python313"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.0"
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}