{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "7261fb06",
   "metadata": {},
   "source": [
    "# Apply\n",
    "\n",
    "The most general `GroupBy` method is `apply`. It splits the object to be processed, calls the passed function on each part and then tries to chain the parts together."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "744d704b",
   "metadata": {},
   "source": [
    "Suppose we want to select the five largest `hit` values by group. To do this, we first write a function that selects the rows with the largest values in a particular column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "d764b7b9",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "6344dbd3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>2021-12</th>\n",
       "      <th>2022-01</th>\n",
       "      <th>2022-02</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Title</th>\n",
       "      <th>Language</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Jupyter Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>30134.0</td>\n",
       "      <td>33295.0</td>\n",
       "      <td>19651.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>6073.0</td>\n",
       "      <td>7716.0</td>\n",
       "      <td>6547.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">PyViz Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>4873.0</td>\n",
       "      <td>3930.0</td>\n",
       "      <td>2573.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Python Basics</th>\n",
       "      <th>de</th>\n",
       "      <td>427.0</td>\n",
       "      <td>276.0</td>\n",
       "      <td>525.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>95.0</td>\n",
       "      <td>226.0</td>\n",
       "      <td>157.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                           2021-12  2022-01  2022-02\n",
       "Title            Language                           \n",
       "Jupyter Tutorial de        30134.0  33295.0  19651.0\n",
       "                 en         6073.0   7716.0   6547.0\n",
       "PyViz Tutorial   de         4873.0   3930.0   2573.0\n",
       "                 en            NaN      NaN      NaN\n",
       "Python Basics    de          427.0    276.0    525.0\n",
       "                 en           95.0    226.0    157.0"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame(\n",
    "    {\n",
    "        \"2021-12\": [30134, 6073, 4873, None, 427, 95],\n",
    "        \"2022-01\": [33295, 7716, 3930, None, 276, 226],\n",
    "        \"2022-02\": [19651, 6547, 2573, None, 525, 157],\n",
    "    },\n",
    "    index=[\n",
    "        [\n",
    "            \"Jupyter Tutorial\",\n",
    "            \"Jupyter Tutorial\",\n",
    "            \"PyViz Tutorial\",\n",
    "            \"PyViz Tutorial\",\n",
    "            \"Python Basics\",\n",
    "            \"Python Basics\",\n",
    "        ],\n",
    "        [\"de\", \"en\", \"de\", \"en\", \"de\", \"en\"],\n",
    "    ],\n",
    ")\n",
    "df.index.names = [\"Title\", \"Language\"]\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "32423696",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>2021-12</th>\n",
       "      <th>2022-01</th>\n",
       "      <th>2022-02</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Title</th>\n",
       "      <th>Language</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Jupyter Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>30134.0</td>\n",
       "      <td>33295.0</td>\n",
       "      <td>19651.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>6073.0</td>\n",
       "      <td>7716.0</td>\n",
       "      <td>6547.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PyViz Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>4873.0</td>\n",
       "      <td>3930.0</td>\n",
       "      <td>2573.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                           2021-12  2022-01  2022-02\n",
       "Title            Language                           \n",
       "Jupyter Tutorial de        30134.0  33295.0  19651.0\n",
       "                 en         6073.0   7716.0   6547.0\n",
       "PyViz Tutorial   de         4873.0   3930.0   2573.0"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def top(df, n=5, column=\"2021-12\"):\n",
    "    return df.sort_values(by=column, ascending=False)[:n]\n",
    "\n",
    "\n",
    "top(df, n=3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f4bfc5a5",
   "metadata": {},
   "source": [
    "If we now group by titles, for example, and call `apply` with this function, we get the following:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "58ac5a5b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>2021-12</th>\n",
       "      <th>2022-01</th>\n",
       "      <th>2022-02</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>Title</th>\n",
       "      <th>Language</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">0</th>\n",
       "      <th rowspan=\"2\" valign=\"top\">Jupyter Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>30134.0</td>\n",
       "      <td>33295.0</td>\n",
       "      <td>19651.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>6073.0</td>\n",
       "      <td>7716.0</td>\n",
       "      <td>6547.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">1</th>\n",
       "      <th rowspan=\"2\" valign=\"top\">PyViz Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>4873.0</td>\n",
       "      <td>3930.0</td>\n",
       "      <td>2573.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">2</th>\n",
       "      <th rowspan=\"2\" valign=\"top\">Python Basics</th>\n",
       "      <th>de</th>\n",
       "      <td>427.0</td>\n",
       "      <td>276.0</td>\n",
       "      <td>525.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>95.0</td>\n",
       "      <td>226.0</td>\n",
       "      <td>157.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                             2021-12  2022-01  2022-02\n",
       "  Title            Language                           \n",
       "0 Jupyter Tutorial de        30134.0  33295.0  19651.0\n",
       "                   en         6073.0   7716.0   6547.0\n",
       "1 PyViz Tutorial   de         4873.0   3930.0   2573.0\n",
       "                   en            NaN      NaN      NaN\n",
       "2 Python Basics    de          427.0    276.0    525.0\n",
       "                   en           95.0    226.0    157.0"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped_titles = df.groupby(\"Title\", as_index=False)\n",
    "\n",
    "grouped_titles.apply(top)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8e959413",
   "metadata": {},
   "source": [
    "What happened here? The upper function is called for each row group of the `DataFrame`, and then the results are concatenated with [pandas.concat](https://pandas.pydata.org/docs/reference/api/pandas.concat.html), labelling the parts with the group names. The result therefore has a hierarchical index whose inner level contains index values from the original `DataFrame`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "275064af",
   "metadata": {},
   "source": [
    "If you pass a function to `apply` that takes other arguments or keywords, you can pass them after the function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "a9bba923",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>2021-12</th>\n",
       "      <th>2022-01</th>\n",
       "      <th>2022-02</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>Title</th>\n",
       "      <th>Language</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <th>Jupyter Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>30134.0</td>\n",
       "      <td>33295.0</td>\n",
       "      <td>19651.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <th>PyViz Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>4873.0</td>\n",
       "      <td>3930.0</td>\n",
       "      <td>2573.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <th>Python Basics</th>\n",
       "      <th>de</th>\n",
       "      <td>427.0</td>\n",
       "      <td>276.0</td>\n",
       "      <td>525.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                             2021-12  2022-01  2022-02\n",
       "  Title            Language                           \n",
       "0 Jupyter Tutorial de        30134.0  33295.0  19651.0\n",
       "1 PyViz Tutorial   de         4873.0   3930.0   2573.0\n",
       "2 Python Basics    de          427.0    276.0    525.0"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped_titles = df.groupby(\"Title\", as_index=False)\n",
    "\n",
    "grouped_titles.apply(top, n=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed60be92",
   "metadata": {},
   "source": [
    "We have now seen the basic usage of `apply`. What happens inside the passed function is very versatile and up to you; it only has to return a pandas object or a single value. In the following, we will therefore mainly show examples that can give you ideas on how to solve various problems with `groupby`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4cbddfa4",
   "metadata": {},
   "source": [
    "First, let’s look again at `describe`, called over the `GroupBy` object:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "3a534db6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"8\" halign=\"left\">2021-12</th>\n",
       "      <th colspan=\"5\" halign=\"left\">2022-01</th>\n",
       "      <th colspan=\"8\" halign=\"left\">2022-02</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>25%</th>\n",
       "      <th>50%</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>...</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>25%</th>\n",
       "      <th>50%</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Title</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Jupyter Tutorial</th>\n",
       "      <td>2.0</td>\n",
       "      <td>18103.5</td>\n",
       "      <td>17013.696262</td>\n",
       "      <td>6073.0</td>\n",
       "      <td>12088.25</td>\n",
       "      <td>18103.5</td>\n",
       "      <td>24118.75</td>\n",
       "      <td>30134.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>20505.5</td>\n",
       "      <td>...</td>\n",
       "      <td>26900.25</td>\n",
       "      <td>33295.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>13099.0</td>\n",
       "      <td>9265.927261</td>\n",
       "      <td>6547.0</td>\n",
       "      <td>9823.0</td>\n",
       "      <td>13099.0</td>\n",
       "      <td>16375.0</td>\n",
       "      <td>19651.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PyViz Tutorial</th>\n",
       "      <td>1.0</td>\n",
       "      <td>4873.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>4873.0</td>\n",
       "      <td>4873.00</td>\n",
       "      <td>4873.0</td>\n",
       "      <td>4873.00</td>\n",
       "      <td>4873.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>3930.0</td>\n",
       "      <td>...</td>\n",
       "      <td>3930.00</td>\n",
       "      <td>3930.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>2573.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2573.0</td>\n",
       "      <td>2573.0</td>\n",
       "      <td>2573.0</td>\n",
       "      <td>2573.0</td>\n",
       "      <td>2573.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Python Basics</th>\n",
       "      <td>2.0</td>\n",
       "      <td>261.0</td>\n",
       "      <td>234.759451</td>\n",
       "      <td>95.0</td>\n",
       "      <td>178.00</td>\n",
       "      <td>261.0</td>\n",
       "      <td>344.00</td>\n",
       "      <td>427.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>251.0</td>\n",
       "      <td>...</td>\n",
       "      <td>263.50</td>\n",
       "      <td>276.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>341.0</td>\n",
       "      <td>260.215295</td>\n",
       "      <td>157.0</td>\n",
       "      <td>249.0</td>\n",
       "      <td>341.0</td>\n",
       "      <td>433.0</td>\n",
       "      <td>525.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>3 rows × 24 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                 2021-12                                                    \\\n",
       "                   count     mean           std     min       25%      50%   \n",
       "Title                                                                        \n",
       "Jupyter Tutorial     2.0  18103.5  17013.696262  6073.0  12088.25  18103.5   \n",
       "PyViz Tutorial       1.0   4873.0           NaN  4873.0   4873.00   4873.0   \n",
       "Python Basics        2.0    261.0    234.759451    95.0    178.00    261.0   \n",
       "\n",
       "                                    2022-01           ...                     \\\n",
       "                       75%      max   count     mean  ...       75%      max   \n",
       "Title                                                 ...                      \n",
       "Jupyter Tutorial  24118.75  30134.0     2.0  20505.5  ...  26900.25  33295.0   \n",
       "PyViz Tutorial     4873.00   4873.0     1.0   3930.0  ...   3930.00   3930.0   \n",
       "Python Basics       344.00    427.0     2.0    251.0  ...    263.50    276.0   \n",
       "\n",
       "                 2022-02                                                 \\\n",
       "                   count     mean          std     min     25%      50%   \n",
       "Title                                                                     \n",
       "Jupyter Tutorial     2.0  13099.0  9265.927261  6547.0  9823.0  13099.0   \n",
       "PyViz Tutorial       1.0   2573.0          NaN  2573.0  2573.0   2573.0   \n",
       "Python Basics        2.0    341.0   260.215295   157.0   249.0    341.0   \n",
       "\n",
       "                                    \n",
       "                      75%      max  \n",
       "Title                               \n",
       "Jupyter Tutorial  16375.0  19651.0  \n",
       "PyViz Tutorial     2573.0   2573.0  \n",
       "Python Basics       433.0    525.0  \n",
       "\n",
       "[3 rows x 24 columns]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped_titles = df.groupby(\"Title\")\n",
    "\n",
    "result = grouped_titles.describe()\n",
    "\n",
    "result"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4a471ff4",
   "metadata": {},
   "source": [
    "When you call a method like `describe` within `GroupBy`, it is actually just an abbreviation for:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "b3f40be1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>2021-12</th>\n",
       "      <th>2022-01</th>\n",
       "      <th>2022-02</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Title</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"8\" valign=\"top\">Jupyter Tutorial</th>\n",
       "      <th>count</th>\n",
       "      <td>2.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>2.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>18103.500000</td>\n",
       "      <td>20505.500000</td>\n",
       "      <td>13099.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>17013.696262</td>\n",
       "      <td>18087.084356</td>\n",
       "      <td>9265.927261</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>6073.000000</td>\n",
       "      <td>7716.000000</td>\n",
       "      <td>6547.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>12088.250000</td>\n",
       "      <td>14110.750000</td>\n",
       "      <td>9823.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>18103.500000</td>\n",
       "      <td>20505.500000</td>\n",
       "      <td>13099.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>24118.750000</td>\n",
       "      <td>26900.250000</td>\n",
       "      <td>16375.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>30134.000000</td>\n",
       "      <td>33295.000000</td>\n",
       "      <td>19651.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"8\" valign=\"top\">PyViz Tutorial</th>\n",
       "      <th>count</th>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>4873.000000</td>\n",
       "      <td>3930.000000</td>\n",
       "      <td>2573.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>4873.000000</td>\n",
       "      <td>3930.000000</td>\n",
       "      <td>2573.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>4873.000000</td>\n",
       "      <td>3930.000000</td>\n",
       "      <td>2573.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>4873.000000</td>\n",
       "      <td>3930.000000</td>\n",
       "      <td>2573.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>4873.000000</td>\n",
       "      <td>3930.000000</td>\n",
       "      <td>2573.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>4873.000000</td>\n",
       "      <td>3930.000000</td>\n",
       "      <td>2573.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"8\" valign=\"top\">Python Basics</th>\n",
       "      <th>count</th>\n",
       "      <td>2.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>2.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>261.000000</td>\n",
       "      <td>251.000000</td>\n",
       "      <td>341.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>234.759451</td>\n",
       "      <td>35.355339</td>\n",
       "      <td>260.215295</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>95.000000</td>\n",
       "      <td>226.000000</td>\n",
       "      <td>157.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>178.000000</td>\n",
       "      <td>238.500000</td>\n",
       "      <td>249.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>261.000000</td>\n",
       "      <td>251.000000</td>\n",
       "      <td>341.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>344.000000</td>\n",
       "      <td>263.500000</td>\n",
       "      <td>433.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>427.000000</td>\n",
       "      <td>276.000000</td>\n",
       "      <td>525.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                             2021-12       2022-01       2022-02\n",
       "Title                                                           \n",
       "Jupyter Tutorial count      2.000000      2.000000      2.000000\n",
       "                 mean   18103.500000  20505.500000  13099.000000\n",
       "                 std    17013.696262  18087.084356   9265.927261\n",
       "                 min     6073.000000   7716.000000   6547.000000\n",
       "                 25%    12088.250000  14110.750000   9823.000000\n",
       "                 50%    18103.500000  20505.500000  13099.000000\n",
       "                 75%    24118.750000  26900.250000  16375.000000\n",
       "                 max    30134.000000  33295.000000  19651.000000\n",
       "PyViz Tutorial   count      1.000000      1.000000      1.000000\n",
       "                 mean    4873.000000   3930.000000   2573.000000\n",
       "                 std             NaN           NaN           NaN\n",
       "                 min     4873.000000   3930.000000   2573.000000\n",
       "                 25%     4873.000000   3930.000000   2573.000000\n",
       "                 50%     4873.000000   3930.000000   2573.000000\n",
       "                 75%     4873.000000   3930.000000   2573.000000\n",
       "                 max     4873.000000   3930.000000   2573.000000\n",
       "Python Basics    count      2.000000      2.000000      2.000000\n",
       "                 mean     261.000000    251.000000    341.000000\n",
       "                 std      234.759451     35.355339    260.215295\n",
       "                 min       95.000000    226.000000    157.000000\n",
       "                 25%      178.000000    238.500000    249.000000\n",
       "                 50%      261.000000    251.000000    341.000000\n",
       "                 75%      344.000000    263.500000    433.000000\n",
       "                 max      427.000000    276.000000    525.000000"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "f = lambda x: x.describe()\n",
    "grouped_titles.apply(f)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "90a07bfc",
   "metadata": {},
   "source": [
    "## Suppression of the group keys\n",
    "\n",
    "In the previous examples, you saw that the resulting object has a hierarchical index formed by the group keys together with the indices of the individual parts of the original object. You can disable this by passing `group_keys=False` to `groupby`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "0f51bcb0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>2021-12</th>\n",
       "      <th>2022-01</th>\n",
       "      <th>2022-02</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Title</th>\n",
       "      <th>Language</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Jupyter Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>30134.0</td>\n",
       "      <td>33295.0</td>\n",
       "      <td>19651.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PyViz Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>4873.0</td>\n",
       "      <td>3930.0</td>\n",
       "      <td>2573.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Python Basics</th>\n",
       "      <th>de</th>\n",
       "      <td>427.0</td>\n",
       "      <td>276.0</td>\n",
       "      <td>525.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Jupyter Tutorial</th>\n",
       "      <th>en</th>\n",
       "      <td>6073.0</td>\n",
       "      <td>7716.0</td>\n",
       "      <td>6547.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Python Basics</th>\n",
       "      <th>en</th>\n",
       "      <td>95.0</td>\n",
       "      <td>226.0</td>\n",
       "      <td>157.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PyViz Tutorial</th>\n",
       "      <th>en</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                           2021-12  2022-01  2022-02\n",
       "Title            Language                           \n",
       "Jupyter Tutorial de        30134.0  33295.0  19651.0\n",
       "PyViz Tutorial   de         4873.0   3930.0   2573.0\n",
       "Python Basics    de          427.0    276.0    525.0\n",
       "Jupyter Tutorial en         6073.0   7716.0   6547.0\n",
       "Python Basics    en           95.0    226.0    157.0\n",
       "PyViz Tutorial   en            NaN      NaN      NaN"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped_lang = df.groupby(\"Language\", group_keys=False)\n",
    "\n",
    "grouped_lang.apply(top)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2131c5cf",
   "metadata": {},
   "source": [
    "## Quantile and bucket analysis\n",
    "\n",
    "As described in [discretisation and grouping](discretisation.ipynb), pandas has some tools, especially `cut` and `qcut`, to split data into buckets with bins of your choice or by sample quantiles. Combine these functions with `groupby` and you can conveniently perform bucket or quantile analysis on a dataset. Consider a simple random data set and a bucket categorisation of equal length with `cut`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "8b54db53",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    (-1.985, -0.309]\n",
       "1    (-3.667, -1.985]\n",
       "2    (-1.985, -0.309]\n",
       "3    (-1.985, -0.309]\n",
       "4    (-1.985, -0.309]\n",
       "5     (-0.309, 1.367]\n",
       "6    (-1.985, -0.309]\n",
       "7      (1.367, 3.043]\n",
       "8    (-1.985, -0.309]\n",
       "9     (-0.309, 1.367]\n",
       "Name: data1, dtype: category\n",
       "Categories (4, interval[float64, right]): [(-3.667, -1.985] < (-1.985, -0.309] < (-0.309, 1.367] < (1.367, 3.043]]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rng = np.random.default_rng()\n",
    "df2 = pd.DataFrame(\n",
    "    {\n",
    "        \"data1\": rng.normal(size=1000),\n",
    "        \"data2\": rng.normal(size=1000)\n",
    "    }\n",
    ")\n",
    "\n",
    "quartiles = pd.cut(df2.data1, 4)\n",
    "\n",
    "quartiles[:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ab66f0c8",
   "metadata": {},
   "source": [
    "The `category` object returned by `cut` can be passed directly to `groupby`. So we could calculate a set of group statistics for the quartiles as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "6157be57",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>min</th>\n",
       "      <th>max</th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data1</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">(-3.667, -1.985]</th>\n",
       "      <th>data1</th>\n",
       "      <td>-3.660514</td>\n",
       "      <td>-1.997396</td>\n",
       "      <td>31</td>\n",
       "      <td>-2.422460</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data2</th>\n",
       "      <td>-1.729069</td>\n",
       "      <td>2.380827</td>\n",
       "      <td>31</td>\n",
       "      <td>0.193483</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">(-1.985, -0.309]</th>\n",
       "      <th>data1</th>\n",
       "      <td>-1.974745</td>\n",
       "      <td>-0.310654</td>\n",
       "      <td>356</td>\n",
       "      <td>-0.941064</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data2</th>\n",
       "      <td>-2.772165</td>\n",
       "      <td>3.049964</td>\n",
       "      <td>356</td>\n",
       "      <td>0.006150</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">(-0.309, 1.367]</th>\n",
       "      <th>data1</th>\n",
       "      <td>-0.306660</td>\n",
       "      <td>1.366418</td>\n",
       "      <td>525</td>\n",
       "      <td>0.398290</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data2</th>\n",
       "      <td>-3.678267</td>\n",
       "      <td>3.307415</td>\n",
       "      <td>525</td>\n",
       "      <td>-0.013217</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">(1.367, 3.043]</th>\n",
       "      <th>data1</th>\n",
       "      <td>1.369270</td>\n",
       "      <td>3.042972</td>\n",
       "      <td>88</td>\n",
       "      <td>1.813557</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data2</th>\n",
       "      <td>-2.885412</td>\n",
       "      <td>2.047945</td>\n",
       "      <td>88</td>\n",
       "      <td>0.045750</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                             min       max  count      mean\n",
       "data1                                                      \n",
       "(-3.667, -1.985] data1 -3.660514 -1.997396     31 -2.422460\n",
       "                 data2 -1.729069  2.380827     31  0.193483\n",
       "(-1.985, -0.309] data1 -1.974745 -0.310654    356 -0.941064\n",
       "                 data2 -2.772165  3.049964    356  0.006150\n",
       "(-0.309, 1.367]  data1 -0.306660  1.366418    525  0.398290\n",
       "                 data2 -3.678267  3.307415    525 -0.013217\n",
       "(1.367, 3.043]   data1  1.369270  3.042972     88  1.813557\n",
       "                 data2 -2.885412  2.047945     88  0.045750"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def stats(group):\n",
    "    return pd.DataFrame(\n",
    "        {\n",
    "            \"min\": group.min(),\n",
    "            \"max\": group.max(),\n",
    "            \"count\": group.count(),\n",
    "            \"mean\": group.mean(),\n",
    "        }\n",
    "    )\n",
    "\n",
    "\n",
    "grouped_quart = df2.groupby(quartiles, observed=False)\n",
    "\n",
    "grouped_quart.apply(stats)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a000b84b",
   "metadata": {},
   "source": [
    "These were buckets of equal length; to calculate buckets of equal size based on sample quantiles, we can use `qcut`. I pass `labels=False` to get only quantile numbers:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "70da8f32",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>min</th>\n",
       "      <th>max</th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data1</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">0</th>\n",
       "      <th>data1</th>\n",
       "      <td>-3.660514</td>\n",
       "      <td>-0.743539</td>\n",
       "      <td>250</td>\n",
       "      <td>-1.365801</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data2</th>\n",
       "      <td>-2.772165</td>\n",
       "      <td>2.830985</td>\n",
       "      <td>250</td>\n",
       "      <td>-0.040007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">1</th>\n",
       "      <th>data1</th>\n",
       "      <td>-0.742822</td>\n",
       "      <td>-0.019690</td>\n",
       "      <td>250</td>\n",
       "      <td>-0.345565</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data2</th>\n",
       "      <td>-2.209969</td>\n",
       "      <td>3.049964</td>\n",
       "      <td>250</td>\n",
       "      <td>0.146497</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">2</th>\n",
       "      <th>data1</th>\n",
       "      <td>-0.017901</td>\n",
       "      <td>0.627629</td>\n",
       "      <td>250</td>\n",
       "      <td>0.301905</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data2</th>\n",
       "      <td>-2.764695</td>\n",
       "      <td>3.307415</td>\n",
       "      <td>250</td>\n",
       "      <td>-0.077263</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">3</th>\n",
       "      <th>data1</th>\n",
       "      <td>0.628714</td>\n",
       "      <td>3.042972</td>\n",
       "      <td>250</td>\n",
       "      <td>1.243782</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data2</th>\n",
       "      <td>-3.678267</td>\n",
       "      <td>2.260646</td>\n",
       "      <td>250</td>\n",
       "      <td>-0.008129</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  min       max  count      mean\n",
       "data1                                           \n",
       "0     data1 -3.660514 -0.743539    250 -1.365801\n",
       "      data2 -2.772165  2.830985    250 -0.040007\n",
       "1     data1 -0.742822 -0.019690    250 -0.345565\n",
       "      data2 -2.209969  3.049964    250  0.146497\n",
       "2     data1 -0.017901  0.627629    250  0.301905\n",
       "      data2 -2.764695  3.307415    250 -0.077263\n",
       "3     data1  0.628714  3.042972    250  1.243782\n",
       "      data2 -3.678267  2.260646    250 -0.008129"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "quartiles_samp = pd.qcut(df2.data1, 4, labels=False)\n",
    "\n",
    "grouped_quart_samp = df2.groupby(quartiles_samp)\n",
    "\n",
    "grouped_quart_samp.apply(stats)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d4e7e305",
   "metadata": {},
   "source": [
    "## Populating data with group-specific values\n",
    "\n",
    "When cleaning missing data, in some cases you will replace data observations with `dropna`, but in other cases you may want to fill the null values (`NA`) with a fixed value or a value derived from the data. `fillna` is the right tool for this; here, for example, I fill the null values with the mean:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "812aebbf",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0         NaN\n",
       "1    0.682415\n",
       "2   -0.463491\n",
       "3         NaN\n",
       "4    0.397419\n",
       "5    0.607853\n",
       "6         NaN\n",
       "7    0.060891\n",
       "dtype: float64"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = pd.Series(rng.normal(size=8))\n",
    "s[::3] = np.nan\n",
    "\n",
    "s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "627d58df",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    0.257018\n",
       "1    0.682415\n",
       "2   -0.463491\n",
       "3    0.257018\n",
       "4    0.397419\n",
       "5    0.607853\n",
       "6    0.257018\n",
       "7    0.060891\n",
       "dtype: float64"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.fillna(s.mean())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f2a06552",
   "metadata": {},
   "source": [
    "Here are some sample data for my tutorials, divided into German and English editions:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "88c3bb07",
   "metadata": {},
   "source": [
    "Suppose you want the fill value to vary by group. These values can be predefined, and since the groups have an internal `name` attribute, you can use this with `apply`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "3f29009c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>2021-12</th>\n",
       "      <th>2022-01</th>\n",
       "      <th>2022-02</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Language</th>\n",
       "      <th>Title</th>\n",
       "      <th>Language</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">de</th>\n",
       "      <th>Jupyter Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>30134.0</td>\n",
       "      <td>33295.0</td>\n",
       "      <td>19651.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PyViz Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>4873.0</td>\n",
       "      <td>3930.0</td>\n",
       "      <td>2573.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Python Basics</th>\n",
       "      <th>de</th>\n",
       "      <td>427.0</td>\n",
       "      <td>276.0</td>\n",
       "      <td>525.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">en</th>\n",
       "      <th>Jupyter Tutorial</th>\n",
       "      <th>en</th>\n",
       "      <td>6073.0</td>\n",
       "      <td>7716.0</td>\n",
       "      <td>6547.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PyViz Tutorial</th>\n",
       "      <th>en</th>\n",
       "      <td>3469.0</td>\n",
       "      <td>3469.0</td>\n",
       "      <td>3469.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Python Basics</th>\n",
       "      <th>en</th>\n",
       "      <td>95.0</td>\n",
       "      <td>226.0</td>\n",
       "      <td>157.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                    2021-12  2022-01  2022-02\n",
       "Language Title            Language                           \n",
       "de       Jupyter Tutorial de        30134.0  33295.0  19651.0\n",
       "         PyViz Tutorial   de         4873.0   3930.0   2573.0\n",
       "         Python Basics    de          427.0    276.0    525.0\n",
       "en       Jupyter Tutorial en         6073.0   7716.0   6547.0\n",
       "         PyViz Tutorial   en         3469.0   3469.0   3469.0\n",
       "         Python Basics    en           95.0    226.0    157.0"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fill_values = {\"de\": 10632, \"en\": 3469}\n",
    "\n",
    "fill_func = lambda g: g.fillna(fill_values[g.name])\n",
    "\n",
    "df.groupby(\"Language\").apply(fill_func)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aa803757",
   "metadata": {},
   "source": [
    "You can also group the data and use `apply` with a function that calls `fillna` for each data packet:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "c737aa0b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>2021-12</th>\n",
       "      <th>2022-01</th>\n",
       "      <th>2022-02</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Language</th>\n",
       "      <th>Title</th>\n",
       "      <th>Language</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">de</th>\n",
       "      <th>Jupyter Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>30134.0</td>\n",
       "      <td>33295.0</td>\n",
       "      <td>19651.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PyViz Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>4873.0</td>\n",
       "      <td>3930.0</td>\n",
       "      <td>2573.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Python Basics</th>\n",
       "      <th>de</th>\n",
       "      <td>427.0</td>\n",
       "      <td>276.0</td>\n",
       "      <td>525.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">en</th>\n",
       "      <th>Jupyter Tutorial</th>\n",
       "      <th>en</th>\n",
       "      <td>6073.0</td>\n",
       "      <td>7716.0</td>\n",
       "      <td>6547.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PyViz Tutorial</th>\n",
       "      <th>en</th>\n",
       "      <td>3084.0</td>\n",
       "      <td>3971.0</td>\n",
       "      <td>3352.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Python Basics</th>\n",
       "      <th>en</th>\n",
       "      <td>95.0</td>\n",
       "      <td>226.0</td>\n",
       "      <td>157.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                    2021-12  2022-01  2022-02\n",
       "Language Title            Language                           \n",
       "de       Jupyter Tutorial de        30134.0  33295.0  19651.0\n",
       "         PyViz Tutorial   de         4873.0   3930.0   2573.0\n",
       "         Python Basics    de          427.0    276.0    525.0\n",
       "en       Jupyter Tutorial en         6073.0   7716.0   6547.0\n",
       "         PyViz Tutorial   en         3084.0   3971.0   3352.0\n",
       "         Python Basics    en           95.0    226.0    157.0"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fill_mean = lambda g: g.fillna(g.mean())\n",
    "\n",
    "df.groupby(\"Language\").apply(fill_mean)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8c569fb0",
   "metadata": {},
   "source": [
    "## Group weighted average\n",
    "\n",
    "Since operations between columns in a `DataFrame` or two `Series` are possible, we can calculate the group-weighted average, for example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "7af3e1cc",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>category</th>\n",
       "      <th>data</th>\n",
       "      <th>weights</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>de</td>\n",
       "      <td>30354</td>\n",
       "      <td>0.283774</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>de</td>\n",
       "      <td>28408</td>\n",
       "      <td>0.416312</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>de</td>\n",
       "      <td>21630</td>\n",
       "      <td>0.128192</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>de</td>\n",
       "      <td>75974</td>\n",
       "      <td>0.006033</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>en</td>\n",
       "      <td>14334</td>\n",
       "      <td>0.391982</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>en</td>\n",
       "      <td>56144</td>\n",
       "      <td>0.028325</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>en</td>\n",
       "      <td>46108</td>\n",
       "      <td>0.326988</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>en</td>\n",
       "      <td>33759</td>\n",
       "      <td>0.714362</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  category   data   weights\n",
       "0       de  30354  0.283774\n",
       "1       de  28408  0.416312\n",
       "2       de  21630  0.128192\n",
       "3       de  75974  0.006033\n",
       "4       en  14334  0.391982\n",
       "5       en  56144  0.028325\n",
       "6       en  46108  0.326988\n",
       "7       en  33759  0.714362"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df3 = pd.DataFrame(\n",
    "    {\n",
    "        \"category\": [\"de\", \"de\", \"de\", \"de\", \"en\", \"en\", \"en\", \"en\"],\n",
    "        \"data\": np.random.randint(100000, size=8),\n",
    "        \"weights\": np.random.rand(8),\n",
    "    }\n",
    ")\n",
    "\n",
    "df3"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d43fc161",
   "metadata": {},
   "source": [
    "The group average weighted by category would then be:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "733ad0dc",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "category\n",
       "de    28372.421657\n",
       "en    31746.068449\n",
       "dtype: float64"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped_cat = df3.groupby(\"category\")\n",
    "get_wavg = lambda g: np.average(g[\"data\"], weights=g[\"weights\"])\n",
    "\n",
    "grouped_cat.apply(get_wavg, include_groups=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6ae93b7",
   "metadata": {},
   "source": [
    "## Correlation"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ab9453bf",
   "metadata": {},
   "source": [
    "An interesting task could be to calculate a `DataFrame` consisting of the percentage changes."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c4793e0b",
   "metadata": {},
   "source": [
    "For this purpose, we first create a function that calculates the pairwise correlation of the `2021-12` column with the subsequent columns:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "9964169b",
   "metadata": {},
   "outputs": [],
   "source": [
    "corr = lambda x: x.corrwith(x[\"2021-12\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6e3c83be",
   "metadata": {},
   "source": [
    "Next, we calculate the percentage change:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "383eedf7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>2021-12</th>\n",
       "      <th>2022-01</th>\n",
       "      <th>2022-02</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Title</th>\n",
       "      <th>Language</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Jupyter Tutorial</th>\n",
       "      <th>en</th>\n",
       "      <td>-0.798467</td>\n",
       "      <td>-0.768253</td>\n",
       "      <td>-0.666836</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">PyViz Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>-0.197596</td>\n",
       "      <td>-0.490669</td>\n",
       "      <td>-0.606996</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Python Basics</th>\n",
       "      <th>de</th>\n",
       "      <td>-0.912374</td>\n",
       "      <td>-0.929771</td>\n",
       "      <td>-0.795958</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>-0.777518</td>\n",
       "      <td>-0.181159</td>\n",
       "      <td>-0.700952</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                            2021-12   2022-01   2022-02\n",
       "Title            Language                              \n",
       "Jupyter Tutorial en       -0.798467 -0.768253 -0.666836\n",
       "PyViz Tutorial   de       -0.197596 -0.490669 -0.606996\n",
       "                 en        0.000000  0.000000  0.000000\n",
       "Python Basics    de       -0.912374 -0.929771 -0.795958\n",
       "                 en       -0.777518 -0.181159 -0.700952"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pcts = df.pct_change().dropna()\n",
    "\n",
    "pcts"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f70c60fb",
   "metadata": {},
   "source": [
    "Finally, we group these percentage changes by year, which can be extracted from each row label with a one-line function that returns the year attribute of each date label:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "c4e9b858",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>2021-12</th>\n",
       "      <th>2022-01</th>\n",
       "      <th>2022-02</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Language</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>de</th>\n",
       "      <td>1.0</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.00000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.699088</td>\n",
       "      <td>0.99781</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          2021-12   2022-01  2022-02\n",
       "Language                            \n",
       "de            1.0  1.000000  1.00000\n",
       "en            1.0  0.699088  0.99781"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped_lang = pcts.groupby(\"Language\")\n",
    "\n",
    "grouped_lang.apply(corr)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "19097802",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Language\n",
       "de    1.000000\n",
       "en    0.699088\n",
       "dtype: float64"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped_lang.apply(lambda g: g[\"2021-12\"].corr(g[\"2022-01\"]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0eadd87e",
   "metadata": {},
   "source": [
    "## Performance problems with `apply`\n",
    "\n",
    "Since the `apply` method typically acts on each individual value in a `Series`, the function is called once for each value. If you have thousands of values, the function will be called thousands of times. This ignores the fast vectorisations of pandas unless you are using NumPy functions and slow Python is used. For example, we previously grouped the data by title and then called our `top` method with `apply`. Let’s measure the time for this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "4a1d44c9",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "422 μs ± 14.5 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit\n",
    "grouped_titles.apply(top)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eec0e168",
   "metadata": {},
   "source": [
    "We can get the same result without applying by passing the `DataFrame` to our `top` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "b7d1f960",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "39.2 μs ± 586 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit\n",
    "top(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "406badf8",
   "metadata": {},
   "source": [
    "This calculation is 18 times faster."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f1020ebb",
   "metadata": {},
   "source": [
    "## Optimising `apply` with Cython\n",
    "\n",
    "It is not always easy to find an alternative for `apply`. However, numerical operations like our `top` method can be made faster with [Cython](https://cython.org/). To use Cython in Jupyyter, we use the following [IPython magic](../ipython/magics.ipynb):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "a596fc35",
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext Cython"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cb4b4b6c",
   "metadata": {},
   "source": [
    "Then we can define our `top` function with Cython:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "f28ec4a1",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%cython\n",
    "def top_cy(df, n=5, column=\"2021-12\"):\n",
    "    return df.sort_values(by=column, ascending=False)[:n]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "1e95211e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "472 μs ± 27.6 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit\n",
    "grouped_titles.apply(top_cy)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c57cd1fa",
   "metadata": {},
   "source": [
    "We haven’t really gained much with this yet. Further optimisation possibilities would be to define the type in the Cython code with `cpdef`. For this, however, we would have to modify our method, because then no `DataFrame` can be passed."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.13 Kernel",
   "language": "python",
   "name": "python313"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.0"
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}