{
"cells": [
{
"cell_type": "markdown",
"id": "7543a862",
"metadata": {},
"source": [
"# Select and filter data\n",
"\n",
"Indexing series `(obj[...])` works analogously to indexing NumPy arrays, except that you can use index values of the series instead of just integers. Here are some examples:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "5359176d",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "339875bb",
"metadata": {},
"outputs": [],
"source": [
"idx = pd.date_range(\"2022-02-02\", periods=7)\n",
"rng = np.random.default_rng()\n",
"s = pd.Series(rng.normal(size=7), index=idx)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "ac21c059",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2022-02-02 0.852492\n",
"2022-02-03 2.073257\n",
"2022-02-04 -0.391827\n",
"2022-02-05 -1.288372\n",
"2022-02-06 0.238623\n",
"2022-02-07 1.752573\n",
"2022-02-08 -0.155582\n",
"Freq: D, dtype: float64"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "642b2864",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"np.float64(2.073256834089322)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s[\"2022-02-03\"]"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "e1b90b3e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"np.float64(2.073256834089322)"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s.iloc[1]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "ccc10714",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2022-02-04 -0.391827\n",
"2022-02-05 -1.288372\n",
"Freq: D, dtype: float64"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
" s[2:4]"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "0d737dcc",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2022-02-04 -0.391827\n",
"2022-02-03 2.073257\n",
"2022-02-02 0.852492\n",
"dtype: float64"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s[[\"2022-02-04\", \"2022-02-03\", \"2022-02-02\"]]"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "003e3a80",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2022-02-03 2.073257\n",
"2022-02-05 -1.288372\n",
"Freq: 2D, dtype: float64"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s.iloc[[1, 3]]"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "83376cb4",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2022-02-02 0.852492\n",
"2022-02-03 2.073257\n",
"2022-02-06 0.238623\n",
"2022-02-07 1.752573\n",
"dtype: float64"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s[s > 0]"
]
},
{
"cell_type": "markdown",
"id": "295647c8",
"metadata": {},
"source": [
"While you can select data by label in this way, the preferred method for selecting index values is the `loc` operator:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "be9a6c3a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2022-02-04 -0.391827\n",
"2022-02-03 2.073257\n",
"2022-02-02 0.852492\n",
"dtype: float64"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s.loc[[\"2022-02-04\", \"2022-02-03\", \"2022-02-02\"]]"
]
},
{
"cell_type": "markdown",
"id": "50cc31af",
"metadata": {},
"source": [
"The reason for the preference for `loc` is the different treatment of integers when indexing with `[]`. In regular `[]`-based indexing, integers are treated as labels if the index contains integers, so the behaviour varies depending on the data type of the index. In our example, the expression `s.loc[[3, 2, 1]]` will fail because the index does not contain integers:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "94057262",
"metadata": {},
"outputs": [
{
"ename": "KeyError",
"evalue": "\"None of [Index([3, 2, 1], dtype='int64')] are in the [index]\"",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[11], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43ms\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mloc\u001b[49m\u001b[43m[\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m3\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m2\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m]\u001b[49m\n",
"File \u001b[0;32m~/sandbox/py313/.venv/lib/python3.13/site-packages/pandas/core/indexing.py:1191\u001b[0m, in \u001b[0;36m_LocationIndexer.__getitem__\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m 1189\u001b[0m maybe_callable \u001b[38;5;241m=\u001b[39m com\u001b[38;5;241m.\u001b[39mapply_if_callable(key, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mobj)\n\u001b[1;32m 1190\u001b[0m maybe_callable \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_check_deprecated_callable_usage(key, maybe_callable)\n\u001b[0;32m-> 1191\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_getitem_axis\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmaybe_callable\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43maxis\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43maxis\u001b[49m\u001b[43m)\u001b[49m\n",
"File \u001b[0;32m~/sandbox/py313/.venv/lib/python3.13/site-packages/pandas/core/indexing.py:1420\u001b[0m, in \u001b[0;36m_LocIndexer._getitem_axis\u001b[0;34m(self, key, axis)\u001b[0m\n\u001b[1;32m 1417\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mhasattr\u001b[39m(key, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mndim\u001b[39m\u001b[38;5;124m\"\u001b[39m) \u001b[38;5;129;01mand\u001b[39;00m key\u001b[38;5;241m.\u001b[39mndim \u001b[38;5;241m>\u001b[39m \u001b[38;5;241m1\u001b[39m:\n\u001b[1;32m 1418\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mCannot index with multidimensional key\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m-> 1420\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_getitem_iterable\u001b[49m\u001b[43m(\u001b[49m\u001b[43mkey\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43maxis\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43maxis\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 1422\u001b[0m \u001b[38;5;66;03m# nested tuple slicing\u001b[39;00m\n\u001b[1;32m 1423\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m is_nested_tuple(key, labels):\n",
"File \u001b[0;32m~/sandbox/py313/.venv/lib/python3.13/site-packages/pandas/core/indexing.py:1360\u001b[0m, in \u001b[0;36m_LocIndexer._getitem_iterable\u001b[0;34m(self, key, axis)\u001b[0m\n\u001b[1;32m 1357\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_validate_key(key, axis)\n\u001b[1;32m 1359\u001b[0m \u001b[38;5;66;03m# A collection of keys\u001b[39;00m\n\u001b[0;32m-> 1360\u001b[0m keyarr, indexer \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_get_listlike_indexer\u001b[49m\u001b[43m(\u001b[49m\u001b[43mkey\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43maxis\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 1361\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mobj\u001b[38;5;241m.\u001b[39m_reindex_with_indexers(\n\u001b[1;32m 1362\u001b[0m {axis: [keyarr, indexer]}, copy\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m, allow_dups\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m\n\u001b[1;32m 1363\u001b[0m )\n",
"File \u001b[0;32m~/sandbox/py313/.venv/lib/python3.13/site-packages/pandas/core/indexing.py:1558\u001b[0m, in \u001b[0;36m_LocIndexer._get_listlike_indexer\u001b[0;34m(self, key, axis)\u001b[0m\n\u001b[1;32m 1555\u001b[0m ax \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mobj\u001b[38;5;241m.\u001b[39m_get_axis(axis)\n\u001b[1;32m 1556\u001b[0m axis_name \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mobj\u001b[38;5;241m.\u001b[39m_get_axis_name(axis)\n\u001b[0;32m-> 1558\u001b[0m keyarr, indexer \u001b[38;5;241m=\u001b[39m \u001b[43max\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_get_indexer_strict\u001b[49m\u001b[43m(\u001b[49m\u001b[43mkey\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43maxis_name\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 1560\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m keyarr, indexer\n",
"File \u001b[0;32m~/sandbox/py313/.venv/lib/python3.13/site-packages/pandas/core/indexes/base.py:6200\u001b[0m, in \u001b[0;36mIndex._get_indexer_strict\u001b[0;34m(self, key, axis_name)\u001b[0m\n\u001b[1;32m 6197\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 6198\u001b[0m keyarr, indexer, new_indexer \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_reindex_non_unique(keyarr)\n\u001b[0;32m-> 6200\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_raise_if_missing\u001b[49m\u001b[43m(\u001b[49m\u001b[43mkeyarr\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mindexer\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43maxis_name\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 6202\u001b[0m keyarr \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mtake(indexer)\n\u001b[1;32m 6203\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(key, Index):\n\u001b[1;32m 6204\u001b[0m \u001b[38;5;66;03m# GH 42790 - Preserve name from an Index\u001b[39;00m\n",
"File \u001b[0;32m~/sandbox/py313/.venv/lib/python3.13/site-packages/pandas/core/indexes/base.py:6249\u001b[0m, in \u001b[0;36mIndex._raise_if_missing\u001b[0;34m(self, key, indexer, axis_name)\u001b[0m\n\u001b[1;32m 6247\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m nmissing:\n\u001b[1;32m 6248\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m nmissing \u001b[38;5;241m==\u001b[39m \u001b[38;5;28mlen\u001b[39m(indexer):\n\u001b[0;32m-> 6249\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mKeyError\u001b[39;00m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mNone of [\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mkey\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m] are in the [\u001b[39m\u001b[38;5;132;01m{\u001b[39;00maxis_name\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m]\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m 6251\u001b[0m not_found \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mlist\u001b[39m(ensure_index(key)[missing_mask\u001b[38;5;241m.\u001b[39mnonzero()[\u001b[38;5;241m0\u001b[39m]]\u001b[38;5;241m.\u001b[39munique())\n\u001b[1;32m 6252\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mKeyError\u001b[39;00m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mnot_found\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m not in index\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n",
"\u001b[0;31mKeyError\u001b[0m: \"None of [Index([3, 2, 1], dtype='int64')] are in the [index]\""
]
}
],
"source": [
"s.loc[[3, 2, 1]]"
]
},
{
"cell_type": "markdown",
"id": "390c0838",
"metadata": {},
"source": [
"While the `loc` operator exclusively indexes labels, the `iloc` operator exclusively indexes with integers:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "263b4ddf",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2022-02-05 -1.288372\n",
"2022-02-04 -0.391827\n",
"2022-02-03 2.073257\n",
"Freq: -1D, dtype: float64"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s.iloc[[3, 2, 1]]"
]
},
{
"cell_type": "markdown",
"id": "dbe97dd5",
"metadata": {},
"source": [
"You can also slice with labels, but this works differently from normal Python slicing because the endpoint is included:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "3d21b345",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2022-02-03 2.073257\n",
"2022-02-04 -0.391827\n",
"Freq: D, dtype: float64"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s.loc[\"2022-02-03\":\"2022-02-04\"]"
]
},
{
"cell_type": "markdown",
"id": "99deca60",
"metadata": {},
"source": [
"Setting with these methods changes the corresponding section of the row:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "18c03d20",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2022-02-02 0.852492\n",
"2022-02-03 0.000000\n",
"2022-02-04 0.000000\n",
"2022-02-05 -1.288372\n",
"2022-02-06 0.238623\n",
"2022-02-07 1.752573\n",
"2022-02-08 -0.155582\n",
"Freq: D, dtype: float64"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s.loc[\"2022-02-03\":\"2022-02-04\"] = 0\n",
"\n",
"s"
]
},
{
"cell_type": "markdown",
"id": "09a14ccb",
"metadata": {},
"source": [
"Indexing in a DataFrame is used to retrieve one or more columns with either a single value or a sequence:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "3f40a46f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Decimal | \n",
" Octal | \n",
" Key | \n",
"
\n",
" \n",
" | Code | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | U+0000 | \n",
" 0 | \n",
" 001 | \n",
" NUL | \n",
"
\n",
" \n",
" | U+0001 | \n",
" 1 | \n",
" 002 | \n",
" Ctrl-A | \n",
"
\n",
" \n",
" | U+0002 | \n",
" 2 | \n",
" 003 | \n",
" Ctrl-B | \n",
"
\n",
" \n",
" | U+0003 | \n",
" 3 | \n",
" 004 | \n",
" Ctrl-C | \n",
"
\n",
" \n",
" | U+0004 | \n",
" 4 | \n",
" 004 | \n",
" Ctrl-D | \n",
"
\n",
" \n",
" | U+0005 | \n",
" 5 | \n",
" 005 | \n",
" Ctrl-E | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Decimal Octal Key\n",
"Code \n",
"U+0000 0 001 NUL\n",
"U+0001 1 002 Ctrl-A\n",
"U+0002 2 003 Ctrl-B\n",
"U+0003 3 004 Ctrl-C\n",
"U+0004 4 004 Ctrl-D\n",
"U+0005 5 005 Ctrl-E"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data = {\n",
" \"Code\": [\"U+0000\", \"U+0001\", \"U+0002\", \"U+0003\", \"U+0004\", \"U+0005\"],\n",
" \"Decimal\": [0, 1, 2, 3, 4, 5],\n",
" \"Octal\": [\"001\", \"002\", \"003\", \"004\", \"004\", \"005\"],\n",
" \"Key\": [\"NUL\", \"Ctrl-A\", \"Ctrl-B\", \"Ctrl-C\", \"Ctrl-D\", \"Ctrl-E\"],\n",
"}\n",
"\n",
"df = pd.DataFrame(data)\n",
"df = pd.DataFrame(data, columns=[\"Decimal\", \"Octal\", \"Key\"], index=df[\"Code\"])\n",
"\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "ff8881ff",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Code\n",
"U+0000 NUL\n",
"U+0001 Ctrl-A\n",
"U+0002 Ctrl-B\n",
"U+0003 Ctrl-C\n",
"U+0004 Ctrl-D\n",
"U+0005 Ctrl-E\n",
"Name: Key, dtype: object"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[\"Key\"]"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "6701d732",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Decimal | \n",
" Key | \n",
"
\n",
" \n",
" | Code | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | U+0000 | \n",
" 0 | \n",
" NUL | \n",
"
\n",
" \n",
" | U+0001 | \n",
" 1 | \n",
" Ctrl-A | \n",
"
\n",
" \n",
" | U+0002 | \n",
" 2 | \n",
" Ctrl-B | \n",
"
\n",
" \n",
" | U+0003 | \n",
" 3 | \n",
" Ctrl-C | \n",
"
\n",
" \n",
" | U+0004 | \n",
" 4 | \n",
" Ctrl-D | \n",
"
\n",
" \n",
" | U+0005 | \n",
" 5 | \n",
" Ctrl-E | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Decimal Key\n",
"Code \n",
"U+0000 0 NUL\n",
"U+0001 1 Ctrl-A\n",
"U+0002 2 Ctrl-B\n",
"U+0003 3 Ctrl-C\n",
"U+0004 4 Ctrl-D\n",
"U+0005 5 Ctrl-E"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[[\"Decimal\", \"Key\"]]"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "e6af9bbe",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Decimal | \n",
" Octal | \n",
" Key | \n",
"
\n",
" \n",
" | Code | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | U+0000 | \n",
" 0 | \n",
" 001 | \n",
" NUL | \n",
"
\n",
" \n",
" | U+0001 | \n",
" 1 | \n",
" 002 | \n",
" Ctrl-A | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Decimal Octal Key\n",
"Code \n",
"U+0000 0 001 NUL\n",
"U+0001 1 002 Ctrl-A"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[:2]"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "9d0deda5",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Decimal | \n",
" Octal | \n",
" Key | \n",
"
\n",
" \n",
" | Code | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | U+0003 | \n",
" 3 | \n",
" 004 | \n",
" Ctrl-C | \n",
"
\n",
" \n",
" | U+0004 | \n",
" 4 | \n",
" 004 | \n",
" Ctrl-D | \n",
"
\n",
" \n",
" | U+0005 | \n",
" 5 | \n",
" 005 | \n",
" Ctrl-E | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Decimal Octal Key\n",
"Code \n",
"U+0003 3 004 Ctrl-C\n",
"U+0004 4 004 Ctrl-D\n",
"U+0005 5 005 Ctrl-E"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[df[\"Decimal\"] > 2]"
]
},
{
"cell_type": "markdown",
"id": "cbd2174b",
"metadata": {},
"source": [
"The line selection syntax `df[:2]` is provided for convenience. Passing a single item or a list to the `[]` operator selects columns."
]
},
{
"cell_type": "markdown",
"id": "0c4c64e0",
"metadata": {},
"source": [
"Another use case is indexing with a Boolean DataFrame, which is generated by a scalar comparison, for example:"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "b839beb6",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Code\n",
"U+0000 False\n",
"U+0001 False\n",
"U+0002 False\n",
"U+0003 True\n",
"U+0004 True\n",
"U+0005 True\n",
"Name: Decimal, dtype: bool"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[\"Decimal\"] > 2"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "0b7eac34-07cd-4551-bcef-58e2a18e5112",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/var/folders/hk/s8m0bblj0g10hw885gld52mc0000gn/T/ipykernel_90739/3186310461.py:1: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'NA' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.\n",
" df[df[\"Decimal\"] > 2] = \"NA\"\n"
]
}
],
"source": [
"df[df[\"Decimal\"] > 2] = \"NA\""
]
},
{
"cell_type": "markdown",
"id": "f93d6130",
"metadata": {},
"source": [
"Like Series, DataFrame has special operators `loc` and `iloc` for label-based and integer indexing respectively. Since DataFrame is two-dimensional, you can select a subset of the rows and columns with NumPy-like notation using either axis labels (`loc`) or integers (`iloc`)."
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "435668de",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Decimal 2\n",
"Key Ctrl-B\n",
"Name: U+0002, dtype: object"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[\"U+0002\", [\"Decimal\", \"Key\"]]"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "8483cc54",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Octal | \n",
" Key | \n",
"
\n",
" \n",
" | Code | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | U+0002 | \n",
" 003 | \n",
" Ctrl-B | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Octal Key\n",
"Code \n",
"U+0002 003 Ctrl-B"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.iloc[[2], [1, 2]]"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "b4cf28cd",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Octal | \n",
" Key | \n",
"
\n",
" \n",
" | Code | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | U+0000 | \n",
" 001 | \n",
" NUL | \n",
"
\n",
" \n",
" | U+0001 | \n",
" 002 | \n",
" Ctrl-A | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Octal Key\n",
"Code \n",
"U+0000 001 NUL\n",
"U+0001 002 Ctrl-A"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.iloc[[0, 1], [1, 2]]"
]
},
{
"cell_type": "markdown",
"id": "5f407c49",
"metadata": {},
"source": [
"Both indexing functions work with slices in addition to individual labels or lists of labels:"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "90ca1775",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Code\n",
"U+0000 NUL\n",
"U+0001 Ctrl-A\n",
"U+0002 Ctrl-B\n",
"U+0003 NA\n",
"Name: Key, dtype: object"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[:\"U+0003\", \"Key\"]"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "d9e78d4a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Decimal | \n",
" Octal | \n",
" Key | \n",
"
\n",
" \n",
" | Code | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | U+0000 | \n",
" 0 | \n",
" 001 | \n",
" NUL | \n",
"
\n",
" \n",
" | U+0001 | \n",
" 1 | \n",
" 002 | \n",
" Ctrl-A | \n",
"
\n",
" \n",
" | U+0002 | \n",
" 2 | \n",
" 003 | \n",
" Ctrl-B | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Decimal Octal Key\n",
"Code \n",
"U+0000 0 001 NUL\n",
"U+0001 1 002 Ctrl-A\n",
"U+0002 2 003 Ctrl-B"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.iloc[:3, :3]"
]
},
{
"cell_type": "markdown",
"id": "934c211c",
"metadata": {},
"source": [
"So there are many ways to select and rearrange the data contained in a pandas object. In the following, I put together a brief summary of most of these possibilities for DataFrames:\n",
"\n",
"Type | Note\n",
":--- | :---\n",
"`df[LABEL]` | selects a single column or a sequence of columns from the DataFrame\n",
"`df.loc[LABEL]` | selects a single row or a subset of rows from the DataFrame by label\n",
"`df.loc[:, LABEL]` | selects a single column or a subset of columns from the DataFrame by Label\n",
"`df.loc[LABEL1, LABEL2]` | selects both rows and columns by label\n",
"`df.iloc[INTEGER]` | selects a single row or a subset of rows from the DataFrame by integer position\n",
"`df.iloc[INTEGER1, INTEGER2]` | selects a single column or a subset of columns by integer position\n",
"`df.at[LABEL1, LABEL2]` | selects a single value by row and column label\n",
"`df.iat[INTEGER1, INTEGER2]` | selects a scalar value by row and column position (integers)\n",
"`reindex NEW_INDEX` | selects rows or columns by label\n",
"`get_value`, `set_value` | deprecated since version 0.21.0: use `.at[]` or `.iat[]` instead."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.13 Kernel",
"language": "python",
"name": "python313"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.0"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"state": {},
"version_major": 2,
"version_minor": 0
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}