{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "9ae6a951",
   "metadata": {},
   "source": [
    "# Introduction to the data structures of pandas\n",
    "\n",
    "To get started with pandas, you should first familiarise yourself with the two most important data structures [Series](#Series) and [DataFrame](#DataFrame)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "66e4300b",
   "metadata": {},
   "source": [
    "## Series\n",
    "\n",
    "A series is a one-dimensional array-like object containing a sequence of values (of similar types to the NumPy types) and an associated array of data labels called an index. The simplest series is formed from just an array of data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "7ac75912",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "d725a520",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    0.756390\n",
       "1   -0.720152\n",
       "2   -1.241521\n",
       "3    0.008288\n",
       "4   -1.020880\n",
       "5   -0.669150\n",
       "6   -0.959491\n",
       "dtype: float64"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rng = np.random.default_rng()\n",
    "s = pd.Series(rng.normal(size=7))\n",
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ba660e44",
   "metadata": {},
   "source": [
    "The string representation of an interactively displayed series shows the index on the left and the values on the right. Since we have not specified an index for the data, a default index is created consisting of the integers `0` to `N - 1` (where `N` is the length of the data). You can get the array representation and the index object of the series via their [pandas.Series.array](https://pandas.pydata.org/docs/reference/api/pandas.Series.array.html) and [pandas.Series.index](https://pandas.pydata.org/docs/reference/api/pandas.Series.index.html) attributes respectively:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "51b83e88",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<NumpyExtensionArray>\n",
       "[  np.float64(0.7563904474621019),   np.float64(-0.720151807332795),\n",
       "  np.float64(-1.2415207045203973), np.float64(0.008287570553159707),\n",
       "  np.float64(-1.0208804470233657),  np.float64(-0.6691500152824282),\n",
       "  np.float64(-0.9594909838421385)]\n",
       "Length: 7, dtype: float64"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.array"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "5fce1452",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RangeIndex(start=0, stop=7, step=1)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.index"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "291f954e",
   "metadata": {},
   "source": [
    "Often you will want to create an index that identifies each data point with a label:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "d44ab947",
   "metadata": {},
   "outputs": [],
   "source": [
    "idx = pd.date_range(\"2022-01-31\", periods=7)\n",
    "\n",
    "s2 = pd.Series(rng.normal(size=7), index=idx)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "07e33aa2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2022-01-31    0.181385\n",
       "2022-02-01    0.044108\n",
       "2022-02-02   -1.537827\n",
       "2022-02-03    1.172051\n",
       "2022-02-04   -1.792041\n",
       "2022-02-05    0.335298\n",
       "2022-02-06    1.388968\n",
       "Freq: D, dtype: float64"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "000d67c1",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-info\">\n",
    "\n",
    "**See also:**\n",
    "\n",
    "* [Time series / date functionality](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8f0ce94d",
   "metadata": {},
   "source": [
    "Compared to NumPy arrays, you can use labels in the index if you want to select individual values or a group of values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "b94e8aee",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "np.float64(-1.5378272926290917)"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s2[\"2022-02-02\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "fe9074dd",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2022-02-02   -1.537827\n",
       "2022-02-03    1.172051\n",
       "2022-02-04   -1.792041\n",
       "dtype: float64"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s2[[\"2022-02-02\", \"2022-02-03\", \"2022-02-04\"]]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "44e431b9",
   "metadata": {},
   "source": [
    "Here `['2022-02-02', '2022-02-03', '2022-02-04']` is interpreted as a list of indices, even if it contains strings instead of integers."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6da2e42d",
   "metadata": {},
   "source": [
    "When using NumPy functions or NumPy-like operations, such as filtering with a Boolean array, scalar multiplication or applying mathematical functions, the link between index and value is preserved:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "e036f017",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2022-01-31    0.181385\n",
       "2022-02-01    0.044108\n",
       "2022-02-03    1.172051\n",
       "2022-02-05    0.335298\n",
       "2022-02-06    1.388968\n",
       "dtype: float64"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s2[s2 > 0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "26e4ec6e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2022-01-31    0.032900\n",
       "2022-02-01    0.001945\n",
       "2022-02-02    2.364913\n",
       "2022-02-03    1.373704\n",
       "2022-02-04    3.211413\n",
       "2022-02-05    0.112424\n",
       "2022-02-06    1.929233\n",
       "Freq: D, dtype: float64"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s2**2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "d6a516a4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2022-01-31    1.198876\n",
       "2022-02-01    1.045095\n",
       "2022-02-02    0.214847\n",
       "2022-02-03    3.228609\n",
       "2022-02-04    0.166620\n",
       "2022-02-05    1.398356\n",
       "2022-02-06    4.010711\n",
       "Freq: D, dtype: float64"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.exp(s2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "31afeb4a",
   "metadata": {},
   "source": [
    "You can also think of a series as a fixed-length _ordered dict_, since it is an assignment of index values to data values. It can be used in many contexts where you could use a _dict_:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "f145185d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"2022-02-02\" in s2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "2dbcfa8f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"2022-02-09\" in s2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c91b138d",
   "metadata": {},
   "source": [
    "### Missing data\n",
    "\n",
    "I will use `NA` and `null` synonymously to indicate missing data. The functions `isna` and `notna` in pandas should be used to identify missing data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "c0695271",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2022-01-31    False\n",
       "2022-02-01    False\n",
       "2022-02-02    False\n",
       "2022-02-03    False\n",
       "2022-02-04    False\n",
       "2022-02-05    False\n",
       "2022-02-06    False\n",
       "Freq: D, dtype: bool"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.isna(s2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "5914b0b2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2022-01-31    True\n",
       "2022-02-01    True\n",
       "2022-02-02    True\n",
       "2022-02-03    True\n",
       "2022-02-04    True\n",
       "2022-02-05    True\n",
       "2022-02-06    True\n",
       "Freq: D, dtype: bool"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.notna(s2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b44ea086",
   "metadata": {},
   "source": [
    "Series also has these as instance methods:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "b5c3b625",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2022-01-31    False\n",
       "2022-02-01    False\n",
       "2022-02-02    False\n",
       "2022-02-03    False\n",
       "2022-02-04    False\n",
       "2022-02-05    False\n",
       "2022-02-06    False\n",
       "Freq: D, dtype: bool"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s2.isna()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e317559a",
   "metadata": {},
   "source": [
    "Dealing with missing data is discussed in more detail in the section [Managing missing data with pandas](../../clean-prep/nulls.ipynb)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "16414832",
   "metadata": {},
   "source": [
    "A useful feature of Series for many applications is the automatic alignment by index labels in arithmetic operations:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "e818ad58",
   "metadata": {},
   "outputs": [],
   "source": [
    "idx = pd.date_range(\"2022-02-01\", periods=7)\n",
    "\n",
    "s3 = pd.Series(rng.normal(size=7), index=idx)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "f5178641",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(2022-01-31    0.181385\n",
       " 2022-02-01    0.044108\n",
       " 2022-02-02   -1.537827\n",
       " 2022-02-03    1.172051\n",
       " 2022-02-04   -1.792041\n",
       " 2022-02-05    0.335298\n",
       " 2022-02-06    1.388968\n",
       " Freq: D, dtype: float64,\n",
       " 2022-02-01   -0.559119\n",
       " 2022-02-02    0.397340\n",
       " 2022-02-03    0.259035\n",
       " 2022-02-04   -0.138873\n",
       " 2022-02-05    1.520708\n",
       " 2022-02-06   -0.456184\n",
       " 2022-02-07   -0.438187\n",
       " Freq: D, dtype: float64)"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s2, s3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "0c2bb0a4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2022-01-31         NaN\n",
       "2022-02-01   -0.515012\n",
       "2022-02-02   -1.140487\n",
       "2022-02-03    1.431087\n",
       "2022-02-04   -1.930914\n",
       "2022-02-05    1.856006\n",
       "2022-02-06    0.932785\n",
       "2022-02-07         NaN\n",
       "Freq: D, dtype: float64"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s2 + s3"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "501b7318",
   "metadata": {},
   "source": [
    "If you have experience with SQL, this is similar to a [JOIN](https://en.wikipedia.org/wiki/Join_(SQL)) operation."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "13f91d24",
   "metadata": {},
   "source": [
    "Both the Series object itself and its index have a `name` attribute that can be integrated into other areas of the pandas functionality:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "ec61f5ab",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "date\n",
       "2022-02-01   -0.559119\n",
       "2022-02-02    0.397340\n",
       "2022-02-03    0.259035\n",
       "2022-02-04   -0.138873\n",
       "2022-02-05    1.520708\n",
       "2022-02-06   -0.456184\n",
       "2022-02-07   -0.438187\n",
       "Freq: D, Name: floats, dtype: float64"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s3.name = \"floats\"\n",
    "s3.index.name = \"date\"\n",
    "\n",
    "s3"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "50d280e5",
   "metadata": {},
   "source": [
    "## DataFrame\n",
    "\n",
    "A DataFrame represents a rectangular data table and contains an ordered, named collection of columns, each of which can have a different value type. The DataFrame has both a row index and a column index.\n",
    "\n",
    "<div class=\"alert alert-block alert-info\">\n",
    "\n",
    "**Note:**\n",
    "\n",
    "Although a DataFrame is two-dimensional, you can also use it to represent higher-dimensional data in a table format with hierarchical indexing using [join](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.join.html), [combine](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.combine.html) and [Reshaping](https://pandas.pydata.org/pandas-docs/stable/user_guide/reshaping.html).\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "e098202c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Code</th>\n",
       "      <th>Decimal</th>\n",
       "      <th>Octal</th>\n",
       "      <th>Key</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>U+0000</td>\n",
       "      <td>0</td>\n",
       "      <td>001</td>\n",
       "      <td>NUL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>U+0001</td>\n",
       "      <td>1</td>\n",
       "      <td>002</td>\n",
       "      <td>Ctrl-A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>U+0002</td>\n",
       "      <td>2</td>\n",
       "      <td>003</td>\n",
       "      <td>Ctrl-B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>U+0003</td>\n",
       "      <td>3</td>\n",
       "      <td>004</td>\n",
       "      <td>Ctrl-C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>U+0004</td>\n",
       "      <td>4</td>\n",
       "      <td>004</td>\n",
       "      <td>Ctrl-D</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>U+0005</td>\n",
       "      <td>5</td>\n",
       "      <td>005</td>\n",
       "      <td>Ctrl-E</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Code  Decimal Octal     Key\n",
       "0  U+0000        0   001     NUL\n",
       "1  U+0001        1   002  Ctrl-A\n",
       "2  U+0002        2   003  Ctrl-B\n",
       "3  U+0003        3   004  Ctrl-C\n",
       "4  U+0004        4   004  Ctrl-D\n",
       "5  U+0005        5   005  Ctrl-E"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = {\n",
    "    \"Code\": [\"U+0000\", \"U+0001\", \"U+0002\", \"U+0003\", \"U+0004\", \"U+0005\"],\n",
    "    \"Decimal\": [0, 1, 2, 3, 4, 5],\n",
    "    \"Octal\": [\"001\", \"002\", \"003\", \"004\", \"004\", \"005\"],\n",
    "    \"Key\": [\"NUL\", \"Ctrl-A\", \"Ctrl-B\", \"Ctrl-C\", \"Ctrl-D\", \"Ctrl-E\"],\n",
    "}\n",
    "\n",
    "df = pd.DataFrame(data)\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c39e6eac",
   "metadata": {},
   "source": [
    "For large DataFrames, the `head` method selects only the first five rows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "8e9bdee3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Code</th>\n",
       "      <th>Decimal</th>\n",
       "      <th>Octal</th>\n",
       "      <th>Key</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>U+0000</td>\n",
       "      <td>0</td>\n",
       "      <td>001</td>\n",
       "      <td>NUL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>U+0001</td>\n",
       "      <td>1</td>\n",
       "      <td>002</td>\n",
       "      <td>Ctrl-A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>U+0002</td>\n",
       "      <td>2</td>\n",
       "      <td>003</td>\n",
       "      <td>Ctrl-B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>U+0003</td>\n",
       "      <td>3</td>\n",
       "      <td>004</td>\n",
       "      <td>Ctrl-C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>U+0004</td>\n",
       "      <td>4</td>\n",
       "      <td>004</td>\n",
       "      <td>Ctrl-D</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Code  Decimal Octal     Key\n",
       "0  U+0000        0   001     NUL\n",
       "1  U+0001        1   002  Ctrl-A\n",
       "2  U+0002        2   003  Ctrl-B\n",
       "3  U+0003        3   004  Ctrl-C\n",
       "4  U+0004        4   004  Ctrl-D"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "870a9459",
   "metadata": {},
   "source": [
    "You can also specify columns and their order:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "9c069a44",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Code</th>\n",
       "      <th>Key</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>U+0000</td>\n",
       "      <td>NUL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>U+0001</td>\n",
       "      <td>Ctrl-A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>U+0002</td>\n",
       "      <td>Ctrl-B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>U+0003</td>\n",
       "      <td>Ctrl-C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>U+0004</td>\n",
       "      <td>Ctrl-D</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>U+0005</td>\n",
       "      <td>Ctrl-E</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Code     Key\n",
       "0  U+0000     NUL\n",
       "1  U+0001  Ctrl-A\n",
       "2  U+0002  Ctrl-B\n",
       "3  U+0003  Ctrl-C\n",
       "4  U+0004  Ctrl-D\n",
       "5  U+0005  Ctrl-E"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame(data, columns=[\"Code\", \"Key\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a1aeecdb",
   "metadata": {},
   "source": [
    "If you want to pass a column that is not contained in the dict, it will appear without values in the result:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "7689f29f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Code</th>\n",
       "      <th>Decimal</th>\n",
       "      <th>Octal</th>\n",
       "      <th>Description</th>\n",
       "      <th>Key</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>U+0000</td>\n",
       "      <td>0</td>\n",
       "      <td>001</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NUL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>U+0001</td>\n",
       "      <td>1</td>\n",
       "      <td>002</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Ctrl-A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>U+0002</td>\n",
       "      <td>2</td>\n",
       "      <td>003</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Ctrl-B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>U+0003</td>\n",
       "      <td>3</td>\n",
       "      <td>004</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Ctrl-C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>U+0004</td>\n",
       "      <td>4</td>\n",
       "      <td>004</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Ctrl-D</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>U+0005</td>\n",
       "      <td>5</td>\n",
       "      <td>005</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Ctrl-E</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Code  Decimal Octal Description     Key\n",
       "0  U+0000        0   001         NaN     NUL\n",
       "1  U+0001        1   002         NaN  Ctrl-A\n",
       "2  U+0002        2   003         NaN  Ctrl-B\n",
       "3  U+0003        3   004         NaN  Ctrl-C\n",
       "4  U+0004        4   004         NaN  Ctrl-D\n",
       "5  U+0005        5   005         NaN  Ctrl-E"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2 = pd.DataFrame(\n",
    "    data, columns=[\"Code\", \"Decimal\", \"Octal\", \"Description\", \"Key\"]\n",
    ")\n",
    "\n",
    "df2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f1fd0a6",
   "metadata": {},
   "source": [
    "You can retrieve a column in a DataFrame with a dict-like notation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "515a8b9f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    U+0000\n",
       "1    U+0001\n",
       "2    U+0002\n",
       "3    U+0003\n",
       "4    U+0004\n",
       "5    U+0005\n",
       "Name: Code, dtype: object"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"Code\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a3d9a460",
   "metadata": {},
   "source": [
    "This way you can also make a column the index:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "7df680a1",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Decimal</th>\n",
       "      <th>Octal</th>\n",
       "      <th>Description</th>\n",
       "      <th>Key</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Code</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>U+0000</th>\n",
       "      <td>0</td>\n",
       "      <td>001</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NUL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>U+0001</th>\n",
       "      <td>1</td>\n",
       "      <td>002</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Ctrl-A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>U+0002</th>\n",
       "      <td>2</td>\n",
       "      <td>003</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Ctrl-B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>U+0003</th>\n",
       "      <td>3</td>\n",
       "      <td>004</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Ctrl-C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>U+0004</th>\n",
       "      <td>4</td>\n",
       "      <td>004</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Ctrl-D</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>U+0005</th>\n",
       "      <td>5</td>\n",
       "      <td>005</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Ctrl-E</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        Decimal Octal Description     Key\n",
       "Code                                     \n",
       "U+0000        0   001         NaN     NUL\n",
       "U+0001        1   002         NaN  Ctrl-A\n",
       "U+0002        2   003         NaN  Ctrl-B\n",
       "U+0003        3   004         NaN  Ctrl-C\n",
       "U+0004        4   004         NaN  Ctrl-D\n",
       "U+0005        5   005         NaN  Ctrl-E"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2 = pd.DataFrame(\n",
    "    data, columns=[\"Decimal\", \"Octal\", \"Description\", \"Key\"], index=df[\"Code\"]\n",
    ")\n",
    "\n",
    "df2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "05e63cf2",
   "metadata": {},
   "source": [
    "Rows can be retrieved by position or name with the [pandas.DataFrame.loc](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html) attribute:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "e0afbcae",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Decimal             1\n",
       "Octal             002\n",
       "Description       NaN\n",
       "Key            Ctrl-A\n",
       "Name: U+0001, dtype: object"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.loc[\"U+0001\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1d0f7ea8",
   "metadata": {},
   "source": [
    "Column values can be changed by assignment. For example, a scalar value or an array of values could be assigned to the empty _Description_ column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "1c4c8b35",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Decimal</th>\n",
       "      <th>Octal</th>\n",
       "      <th>Description</th>\n",
       "      <th>Key</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Code</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>U+0000</th>\n",
       "      <td>0</td>\n",
       "      <td>001</td>\n",
       "      <td>Null character</td>\n",
       "      <td>NUL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>U+0001</th>\n",
       "      <td>1</td>\n",
       "      <td>002</td>\n",
       "      <td>Start of Heading</td>\n",
       "      <td>Ctrl-A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>U+0002</th>\n",
       "      <td>2</td>\n",
       "      <td>003</td>\n",
       "      <td>Start of Text</td>\n",
       "      <td>Ctrl-B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>U+0003</th>\n",
       "      <td>3</td>\n",
       "      <td>004</td>\n",
       "      <td>End-of-text character</td>\n",
       "      <td>Ctrl-C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>U+0004</th>\n",
       "      <td>4</td>\n",
       "      <td>004</td>\n",
       "      <td>End-of-transmission character</td>\n",
       "      <td>Ctrl-D</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>U+0005</th>\n",
       "      <td>5</td>\n",
       "      <td>005</td>\n",
       "      <td>Enquiry character</td>\n",
       "      <td>Ctrl-E</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        Decimal Octal                    Description     Key\n",
       "Code                                                        \n",
       "U+0000        0   001                 Null character     NUL\n",
       "U+0001        1   002               Start of Heading  Ctrl-A\n",
       "U+0002        2   003                  Start of Text  Ctrl-B\n",
       "U+0003        3   004          End-of-text character  Ctrl-C\n",
       "U+0004        4   004  End-of-transmission character  Ctrl-D\n",
       "U+0005        5   005              Enquiry character  Ctrl-E"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2[\"Description\"] = [\n",
    "    \"Null character\",\n",
    "    \"Start of Heading\",\n",
    "    \"Start of Text\",\n",
    "    \"End-of-text character\",\n",
    "    \"End-of-transmission character\",\n",
    "    \"Enquiry character\",\n",
    "]\n",
    "\n",
    "df2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ddb8d2e7",
   "metadata": {},
   "source": [
    "Assigning a non-existing column creates a new column."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c25da65",
   "metadata": {},
   "source": [
    "Columns can be removed with [pandas.DataFrame.drop](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop.html) and displayed with `pandas.DataFrame.columns`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "809b4260",
   "metadata": {},
   "outputs": [],
   "source": [
    "df3 = df2.drop(columns=[\"Decimal\", \"Octal\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "357cd535",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['Decimal', 'Octal', 'Description', 'Key'], dtype='object')"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "2eac8283",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['Description', 'Key'], dtype='object')"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df3.columns"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0a92d82c",
   "metadata": {},
   "source": [
    "Another common form of data is nested dict of dicts:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "af5ceb43",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>U+0006</th>\n",
       "      <th>U+0007</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Decimal</th>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Octal</th>\n",
       "      <td>006</td>\n",
       "      <td>007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Description</th>\n",
       "      <td>Acknowledge character</td>\n",
       "      <td>Bell character</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Key</th>\n",
       "      <td>Ctrl-F</td>\n",
       "      <td>Ctrl-G</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                            U+0006          U+0007\n",
       "Decimal                          6               7\n",
       "Octal                          006             007\n",
       "Description  Acknowledge character  Bell character\n",
       "Key                         Ctrl-F          Ctrl-G"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "u = {\n",
    "    \"U+0006\": {\n",
    "        \"Decimal\": \"6\",\n",
    "        \"Octal\": \"006\",\n",
    "        \"Description\": \"Acknowledge character\",\n",
    "        \"Key\": \"Ctrl-F\",\n",
    "    },\n",
    "    \"U+0007\": {\n",
    "        \"Decimal\": \"7\",\n",
    "        \"Octal\": \"007\",\n",
    "        \"Description\": \"Bell character\",\n",
    "        \"Key\": \"Ctrl-G\",\n",
    "    },\n",
    "}\n",
    "\n",
    "df4 = pd.DataFrame(u)\n",
    "\n",
    "df4"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6974f177",
   "metadata": {},
   "source": [
    "You can transpose the DataFrame, i.e. swap the rows and columns, with a similar syntax to a NumPy array:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "30e92c1a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Decimal</th>\n",
       "      <th>Octal</th>\n",
       "      <th>Description</th>\n",
       "      <th>Key</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>U+0006</th>\n",
       "      <td>6</td>\n",
       "      <td>006</td>\n",
       "      <td>Acknowledge character</td>\n",
       "      <td>Ctrl-F</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>U+0007</th>\n",
       "      <td>7</td>\n",
       "      <td>007</td>\n",
       "      <td>Bell character</td>\n",
       "      <td>Ctrl-G</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       Decimal Octal            Description     Key\n",
       "U+0006       6   006  Acknowledge character  Ctrl-F\n",
       "U+0007       7   007         Bell character  Ctrl-G"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df4.T"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d9d1179d",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-warning\">\n",
    "\n",
    "**Warning:**\n",
    "\n",
    "Note that when transposing, the data types of the columns are discarded if the columns do not all have the same data type, so when transposing and then transposing back, the previous type information may be lost. In this case, the columns become arrays of pure Python objects.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fe0b39f3",
   "metadata": {},
   "source": [
    "The keys in the inner dicts are combined to form the index in the result. This is not the case when an explicit index is specified:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "453ed83b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>U+0006</th>\n",
       "      <th>U+0007</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Decimal</th>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Octal</th>\n",
       "      <td>006</td>\n",
       "      <td>007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Key</th>\n",
       "      <td>Ctrl-F</td>\n",
       "      <td>Ctrl-G</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         U+0006  U+0007\n",
       "Decimal       6       7\n",
       "Octal       006     007\n",
       "Key      Ctrl-F  Ctrl-G"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df5 = pd.DataFrame(u, index=[\"Decimal\", \"Octal\", \"Key\"])\n",
    "df5"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.13 Kernel",
   "language": "python",
   "name": "python313"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.0"
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}