{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "00236537",
   "metadata": {},
   "source": [
    "# Group operations\n",
    "\n",
    "By `groupby` is meant a process that involves one or more of the following steps:\n",
    "\n",
    "* **Split** divides the data into groups according to certain criteria\n",
    "* **Apply** applies a function independently to each group\n",
    "* **Combine** combines the results in a data structure\n",
    "\n",
    "In the first phase of the process, the data contained in a pandas object, be it a `Series`, a `DataFrame` or something else, is split into groups based on one or more keys. The division is done on a particular axis of an object. For example, a DataFrame can be grouped by its rows (`axis=0`) or its columns (`axis=1`). Then, a function is applied to each group to create a new value. Finally, the results of all these function applications are combined in a result object. The shape of the result object usually depends on what is done with the data.\n",
    "\n",
    "Each grouping key can take many forms, and the keys do not all have to be of the same type:\n",
    "\n",
    "* a list or array of values that have the same length as the axis being grouped\n",
    "* a value that specifies a column name in a DataFrame\n",
    "* a dict or series that is a correspondence between the values on the axis being grouped and the group names\n",
    "* a function that is called on the axis index or the individual labels in the index\n",
    "\n",
    "<div class=\"alert alert-info\">\n",
    "\n",
    "**Note:**\n",
    "\n",
    "The latter three methods are shortcuts to create an array of values that will be used to divide the object.\n",
    "\n",
    "Don't worry if this all seems abstract. Throughout this chapter I will give many examples of all these methods. For starters, here is a small table dataset as a DataFrame:\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "7f8d4433",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "653a5337",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Title</th>\n",
       "      <th>Language</th>\n",
       "      <th>2021-12</th>\n",
       "      <th>2022-01</th>\n",
       "      <th>2022-02</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Jupyter Tutorial</td>\n",
       "      <td>de</td>\n",
       "      <td>19651.0</td>\n",
       "      <td>30134.0</td>\n",
       "      <td>33295.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Jupyter Tutorial</td>\n",
       "      <td>en</td>\n",
       "      <td>4722.0</td>\n",
       "      <td>3497.0</td>\n",
       "      <td>4009.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>PyViz Tutorial</td>\n",
       "      <td>de</td>\n",
       "      <td>2573.0</td>\n",
       "      <td>4873.0</td>\n",
       "      <td>3930.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Python Basics</td>\n",
       "      <td>de</td>\n",
       "      <td>525.0</td>\n",
       "      <td>427.0</td>\n",
       "      <td>276.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Python Basics</td>\n",
       "      <td>en</td>\n",
       "      <td>157.0</td>\n",
       "      <td>85.0</td>\n",
       "      <td>226.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              Title Language  2021-12  2022-01  2022-02\n",
       "0  Jupyter Tutorial       de  19651.0  30134.0  33295.0\n",
       "1  Jupyter Tutorial       en   4722.0   3497.0   4009.0\n",
       "2    PyViz Tutorial       de   2573.0   4873.0   3930.0\n",
       "3              None     None      NaN      NaN      NaN\n",
       "4     Python Basics       de    525.0    427.0    276.0\n",
       "5     Python Basics       en    157.0     85.0    226.0"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame(\n",
    "    {\n",
    "        \"Title\": [\n",
    "            \"Jupyter Tutorial\",\n",
    "            \"Jupyter Tutorial\",\n",
    "            \"PyViz Tutorial\",\n",
    "            None,\n",
    "            \"Python Basics\",\n",
    "            \"Python Basics\",\n",
    "        ],\n",
    "        \"Language\": [\"de\", \"en\", \"de\", None, \"de\", \"en\"],\n",
    "        \"2021-12\": [19651, 4722, 2573, None, 525, 157],\n",
    "        \"2022-01\": [30134, 3497, 4873, None, 427, 85],\n",
    "        \"2022-02\": [33295, 4009, 3930, None, 276, 226],\n",
    "    }\n",
    ")\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "793372a4",
   "metadata": {},
   "source": [
    "Suppose you want to calculate the sum of column `02/2022` using the labels of Title. There are several ways to do this. One is to access `02/2022` and call `groupby` with the column (a `Series`) in `Title`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "8b4b9dbb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pandas.core.groupby.generic.SeriesGroupBy object at 0x103a3c830>"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped = df[\"2022-02\"].groupby(df[\"Title\"])\n",
    "\n",
    "grouped"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "14141021",
   "metadata": {},
   "source": [
    "This `grouped` variable is now a special `SeriesGroupBy` object. It has not yet calculated anything except some intermediate data about the group key `df['Title']`. The idea is that this object has all the information needed to apply an operation to each of the groups. For example, to calculate the group averages, we can call the `sum` method of the `GroupBy` object:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "bab6893d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Title\n",
       "Jupyter Tutorial    37304.0\n",
       "PyViz Tutorial       3930.0\n",
       "Python Basics         502.0\n",
       "Name: 2022-02, dtype: float64"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped.sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "10512d81",
   "metadata": {},
   "source": [
    "Later I will explain more about what happens when you call `.sum()`. The important thing to note here is that the data (a row) has been aggregated by splitting the data across the group key, creating a new row that is now indexed by the unique values in the `Title` column. The resulting index is `Title` because `groupby(df['Title']` did this."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9a22d532",
   "metadata": {},
   "source": [
    "If we had passed multiple arrays as a list instead, we would get something different:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "4a117a7a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Language  Title           \n",
       "de        Jupyter Tutorial    19651.0\n",
       "          PyViz Tutorial       2573.0\n",
       "          Python Basics         525.0\n",
       "en        Jupyter Tutorial     4722.0\n",
       "          Python Basics         157.0\n",
       "Name: 2021-12, dtype: float64"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sums = df[\"2021-12\"].groupby([df[\"Language\"], df[\"Title\"]]).sum()\n",
    "\n",
    "sums"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8d6b3003",
   "metadata": {},
   "source": [
    "Here we have grouped the data based on two keys, and the resulting series now has a hierarchical index consisting of the observed unique key pairs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "55a47106",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>Title</th>\n",
       "      <th>Jupyter Tutorial</th>\n",
       "      <th>PyViz Tutorial</th>\n",
       "      <th>Python Basics</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Language</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>de</th>\n",
       "      <td>19651.0</td>\n",
       "      <td>2573.0</td>\n",
       "      <td>525.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>4722.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>157.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Title     Jupyter Tutorial  PyViz Tutorial  Python Basics\n",
       "Language                                                 \n",
       "de                 19651.0          2573.0          525.0\n",
       "en                  4722.0             NaN          157.0"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sums.unstack()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "587c25bf",
   "metadata": {},
   "source": [
    "Often the grouping information is in the same DataFrame as the data you want to edit. In this case, you can pass column names (whether they are strings, numbers or other Python objects) as group keys:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "5b121b4d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Language</th>\n",
       "      <th>2021-12</th>\n",
       "      <th>2022-01</th>\n",
       "      <th>2022-02</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Title</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Jupyter Tutorial</th>\n",
       "      <td>deen</td>\n",
       "      <td>24373.0</td>\n",
       "      <td>33631.0</td>\n",
       "      <td>37304.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PyViz Tutorial</th>\n",
       "      <td>de</td>\n",
       "      <td>2573.0</td>\n",
       "      <td>4873.0</td>\n",
       "      <td>3930.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Python Basics</th>\n",
       "      <td>deen</td>\n",
       "      <td>682.0</td>\n",
       "      <td>512.0</td>\n",
       "      <td>502.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 Language  2021-12  2022-01  2022-02\n",
       "Title                                               \n",
       "Jupyter Tutorial     deen  24373.0  33631.0  37304.0\n",
       "PyViz Tutorial         de   2573.0   4873.0   3930.0\n",
       "Python Basics        deen    682.0    512.0    502.0"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby(\"Title\").sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "43b62319",
   "metadata": {},
   "source": [
    "Here it is noticeable that the result does not contain a `Language` column. Since `df['Language']` is not numeric data, it interferes with the table layout and is therefore automatically excluded from the result. By default, all numeric columns are aggregated."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "ad498488",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>2021-12</th>\n",
       "      <th>2022-01</th>\n",
       "      <th>2022-02</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Title</th>\n",
       "      <th>Language</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Jupyter Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>19651.0</td>\n",
       "      <td>30134.0</td>\n",
       "      <td>33295.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>4722.0</td>\n",
       "      <td>3497.0</td>\n",
       "      <td>4009.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PyViz Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>2573.0</td>\n",
       "      <td>4873.0</td>\n",
       "      <td>3930.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Python Basics</th>\n",
       "      <th>de</th>\n",
       "      <td>525.0</td>\n",
       "      <td>427.0</td>\n",
       "      <td>276.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>157.0</td>\n",
       "      <td>85.0</td>\n",
       "      <td>226.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                           2021-12  2022-01  2022-02\n",
       "Title            Language                           \n",
       "Jupyter Tutorial de        19651.0  30134.0  33295.0\n",
       "                 en         4722.0   3497.0   4009.0\n",
       "PyViz Tutorial   de         2573.0   4873.0   3930.0\n",
       "Python Basics    de          525.0    427.0    276.0\n",
       "                 en          157.0     85.0    226.0"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby([\"Title\",\"Language\"]).sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0db3a462",
   "metadata": {},
   "source": [
    "Regardless of the goal of using `groupby`, a generally useful `groupby` method is `size`, which returns a series with the group sizes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "491272d6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Language\n",
       "de    3\n",
       "en    2\n",
       "dtype: int64"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby([\"Language\"]).size()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "977da668",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "\n",
    "**Note:**\n",
    "\n",
    "All missing values in a group key are excluded from the result by default. This behaviour can be disabled by passing `dropna=False` to `groupby`:\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "2d4df323",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Language\n",
       "de     3\n",
       "en     2\n",
       "NaN    1\n",
       "dtype: int64"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby(\"Language\", dropna=False).size()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "cd057369",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Title             Language\n",
       "Jupyter Tutorial  de          1\n",
       "                  en          1\n",
       "PyViz Tutorial    de          1\n",
       "Python Basics     de          1\n",
       "                  en          1\n",
       "NaN               NaN         1\n",
       "dtype: int64"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby([\"Title\", \"Language\"], dropna=False).size()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4fa74c34",
   "metadata": {},
   "source": [
    "## Iteration over groups\n",
    "\n",
    "The object returned by `groupby` supports iteration and produces a sequence of 2-tuples containing the group name along with the data packet. Consider the following:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "279606c0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Jupyter Tutorial\n",
      "              Title Language  2021-12  2022-01  2022-02\n",
      "0  Jupyter Tutorial       de  19651.0  30134.0  33295.0\n",
      "1  Jupyter Tutorial       en   4722.0   3497.0   4009.0\n",
      "PyViz Tutorial\n",
      "            Title Language  2021-12  2022-01  2022-02\n",
      "2  PyViz Tutorial       de   2573.0   4873.0   3930.0\n",
      "Python Basics\n",
      "           Title Language  2021-12  2022-01  2022-02\n",
      "4  Python Basics       de    525.0    427.0    276.0\n",
      "5  Python Basics       en    157.0     85.0    226.0\n"
     ]
    }
   ],
   "source": [
    "for name, group in df.groupby(\"Title\"):\n",
    "    print(name)\n",
    "    print(group)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3944439c",
   "metadata": {},
   "source": [
    "With multiple keys, the first element of the tuple is a tuple of key values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "c8778dcd",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "('Jupyter Tutorial', 'de')\n",
      "              Title Language  2021-12  2022-01  2022-02\n",
      "0  Jupyter Tutorial       de  19651.0  30134.0  33295.0\n",
      "('Jupyter Tutorial', 'en')\n",
      "              Title Language  2021-12  2022-01  2022-02\n",
      "1  Jupyter Tutorial       en   4722.0   3497.0   4009.0\n",
      "('PyViz Tutorial', 'de')\n",
      "            Title Language  2021-12  2022-01  2022-02\n",
      "2  PyViz Tutorial       de   2573.0   4873.0   3930.0\n",
      "('Python Basics', 'de')\n",
      "           Title Language  2021-12  2022-01  2022-02\n",
      "4  Python Basics       de    525.0    427.0    276.0\n",
      "('Python Basics', 'en')\n",
      "           Title Language  2021-12  2022-01  2022-02\n",
      "5  Python Basics       en    157.0     85.0    226.0\n"
     ]
    }
   ],
   "source": [
    "for (i1, i2), group in df.groupby([\"Title\", \"Language\"]):\n",
    "    print((i1, i2))\n",
    "    print(group)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4e84aa84",
   "metadata": {},
   "source": [
    "Next, we want to output a `dict` of the data as a one-liner:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "d4c85b5a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'Jupyter Tutorial':               Title Language  2021-12  2022-01  2022-02\n",
       " 0  Jupyter Tutorial       de  19651.0  30134.0  33295.0\n",
       " 1  Jupyter Tutorial       en   4722.0   3497.0   4009.0,\n",
       " 'PyViz Tutorial':             Title Language  2021-12  2022-01  2022-02\n",
       " 2  PyViz Tutorial       de   2573.0   4873.0   3930.0,\n",
       " 'Python Basics':            Title Language  2021-12  2022-01  2022-02\n",
       " 4  Python Basics       de    525.0    427.0    276.0\n",
       " 5  Python Basics       en    157.0     85.0    226.0}"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "books = dict(list(df.groupby(\"Title\")))\n",
    "\n",
    "books"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b64c82f4",
   "metadata": {},
   "source": [
    "By default, `groupby` groups on `axis=0`, but you can also group on any of the other axes. For example, we could group the columns of our example `df` here by `dtype` as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "b3ffcaba",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Title        object\n",
       "Language     object\n",
       "2021-12     float64\n",
       "2022-01     float64\n",
       "2022-02     float64\n",
       "dtype: object"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "f15cb9fc",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/var/folders/hk/s8m0bblj0g10hw885gld52mc0000gn/T/ipykernel_6958/521057107.py:1: FutureWarning: DataFrame.groupby with axis=1 is deprecated. Do `frame.T.groupby(...)` without axis instead.\n",
      "  grouped = df.groupby(df.dtypes, axis=1)\n"
     ]
    }
   ],
   "source": [
    "grouped = df.groupby(df.dtypes, axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "9a12d376",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "float64\n",
      "   2021-12  2022-01  2022-02\n",
      "0  19651.0  30134.0  33295.0\n",
      "1   4722.0   3497.0   4009.0\n",
      "2   2573.0   4873.0   3930.0\n",
      "3      NaN      NaN      NaN\n",
      "4    525.0    427.0    276.0\n",
      "5    157.0     85.0    226.0\n",
      "object\n",
      "              Title Language\n",
      "0  Jupyter Tutorial       de\n",
      "1  Jupyter Tutorial       en\n",
      "2    PyViz Tutorial       de\n",
      "3              None     None\n",
      "4     Python Basics       de\n",
      "5     Python Basics       en\n"
     ]
    }
   ],
   "source": [
    "for dtype, group in grouped:\n",
    "    print(dtype)\n",
    "    print(group)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f351b3fb",
   "metadata": {},
   "source": [
    "## Selecting a column or subset of columns\n",
    "\n",
    "Indexing a `GroupBy` object created from a DataFrame with a column name or an array of column names has the effect of subdividing columns for aggregation. This means that:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "7278c1ad",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pandas.core.groupby.generic.DataFrameGroupBy object at 0x10682dc70>"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby(\"Title\")[\"2021-12\"]\n",
    "df.groupby(\"Title\")[[\"2022-01\"]]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f7982339",
   "metadata": {},
   "source": [
    "are simplified spellings for:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "cc63848f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pandas.core.groupby.generic.DataFrameGroupBy object at 0x10683aeb0>"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"2021-12\"].groupby(df[\"Title\"])\n",
    "df[[\"2022-01\"]].groupby(df[\"Title\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "473b9f20",
   "metadata": {},
   "source": [
    "Especially for large datasets, it may be desirable to aggregate only some columns. For example, to calculate the sum for only column `01/2022` in the previous dataset and get the result as a DataFrame, we could write:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "5cdb6ba9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>2022-01</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Title</th>\n",
       "      <th>Language</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Jupyter Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>30134.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>3497.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PyViz Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>4873.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Python Basics</th>\n",
       "      <th>de</th>\n",
       "      <td>427.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>85.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                           2022-01\n",
       "Title            Language         \n",
       "Jupyter Tutorial de        30134.0\n",
       "                 en         3497.0\n",
       "PyViz Tutorial   de         4873.0\n",
       "Python Basics    de          427.0\n",
       "                 en           85.0"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby([\"Title\", \"Language\"])[[\"2022-01\"]].sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3b765a64",
   "metadata": {},
   "source": [
    "The object returned by this indexing operation is a grouped DataFrame if a list or array is passed, or a grouped series if only a single column name is passed as a scalar:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "99c22f19",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pandas.core.groupby.generic.SeriesGroupBy object at 0x1066555b0>"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "series_grouped = df.groupby([\"Title\", \"Language\"])[\"2022-01\"]\n",
    "\n",
    "series_grouped"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "3ac020d4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Title             Language\n",
       "Jupyter Tutorial  de          30134.0\n",
       "                  en           3497.0\n",
       "PyViz Tutorial    de           4873.0\n",
       "Python Basics     de            427.0\n",
       "                  en             85.0\n",
       "Name: 2022-01, dtype: float64"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "series_grouped.sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3cbabf46",
   "metadata": {},
   "source": [
    "## Grouping with `dicts` and `series`\n",
    "\n",
    "Grouping information can also be in a form other than an array:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "e5140e19",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.iloc[2:3, [2, 3]] = np.nan"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "09dfaa32",
   "metadata": {},
   "source": [
    "Suppose I have a group correspondence for the columns and want to group the columns together by group:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "4a252010",
   "metadata": {},
   "outputs": [],
   "source": [
    "mapping = {\"2021-12\": \"Dec 2021\", \n",
    "           \"2022-01\": \"Jan 2022\",\n",
    "           \"2022-02\": \"Feb 2022\"}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "575135cd",
   "metadata": {},
   "source": [
    "Now an array could be constructed from this `dict` to pass to `groupby`, but instead we can just pass the `dict`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "49c9e3b7",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/var/folders/hk/s8m0bblj0g10hw885gld52mc0000gn/T/ipykernel_6958/2568016302.py:1: FutureWarning: DataFrame.groupby with axis=1 is deprecated. Do `frame.T.groupby(...)` without axis instead.\n",
      "  by_column = df.groupby(mapping, axis=1)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Dec 2021</th>\n",
       "      <th>Feb 2022</th>\n",
       "      <th>Jan 2022</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>19651.0</td>\n",
       "      <td>33295.0</td>\n",
       "      <td>30134.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4722.0</td>\n",
       "      <td>4009.0</td>\n",
       "      <td>3497.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.0</td>\n",
       "      <td>3930.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>525.0</td>\n",
       "      <td>276.0</td>\n",
       "      <td>427.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>157.0</td>\n",
       "      <td>226.0</td>\n",
       "      <td>85.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Dec 2021  Feb 2022  Jan 2022\n",
       "0   19651.0   33295.0   30134.0\n",
       "1    4722.0    4009.0    3497.0\n",
       "2       0.0    3930.0       0.0\n",
       "3       0.0       0.0       0.0\n",
       "4     525.0     276.0     427.0\n",
       "5     157.0     226.0      85.0"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "by_column = df.groupby(mapping, axis=1)\n",
    "\n",
    "by_column.sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "66651487",
   "metadata": {},
   "source": [
    "The same functionality applies to `Series`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "eb0556f4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2021-12    Dec 2021\n",
       "2022-01    Jan 2022\n",
       "2022-02    Feb 2022\n",
       "dtype: object"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "map_series = pd.Series(mapping)\n",
    "\n",
    "map_series"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "2e9dd6a0",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/var/folders/hk/s8m0bblj0g10hw885gld52mc0000gn/T/ipykernel_6958/3115622287.py:1: FutureWarning: DataFrame.groupby with axis=1 is deprecated. Do `frame.T.groupby(...)` without axis instead.\n",
      "  df.groupby(map_series, axis=1).sum()\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Dec 2021</th>\n",
       "      <th>Feb 2022</th>\n",
       "      <th>Jan 2022</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>19651.0</td>\n",
       "      <td>33295.0</td>\n",
       "      <td>30134.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4722.0</td>\n",
       "      <td>4009.0</td>\n",
       "      <td>3497.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.0</td>\n",
       "      <td>3930.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>525.0</td>\n",
       "      <td>276.0</td>\n",
       "      <td>427.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>157.0</td>\n",
       "      <td>226.0</td>\n",
       "      <td>85.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Dec 2021  Feb 2022  Jan 2022\n",
       "0   19651.0   33295.0   30134.0\n",
       "1    4722.0    4009.0    3497.0\n",
       "2       0.0    3930.0       0.0\n",
       "3       0.0       0.0       0.0\n",
       "4     525.0     276.0     427.0\n",
       "5     157.0     226.0      85.0"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby(map_series, axis=1).sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8e7fec0f",
   "metadata": {},
   "source": [
    "## Grouping with Functions\n",
    "\n",
    "Using Python functions is a more general method of defining a group assignment compared to a `Dict` or `Series`. Each function passed as a group key is called once per index value, with the return values used as group names. Specifically, consider the example DataFrame from the previous section, which contains the titles as index values. Suppose If you want to group by the length of the names, you can calculate an array with the lengths of the strings, but it is easier to pass the `len` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "e55f29d7",
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.DataFrame(\n",
    "    [\n",
    "        [19651, 30134, 33295],\n",
    "        [4722, 3497, 4009],\n",
    "        [2573, 4873, 3930],\n",
    "        [525, 427, 276],\n",
    "        [157, 85, 226],\n",
    "    ],\n",
    "    index=[\n",
    "        \"Jupyter Tutorial\",\n",
    "        \"Jupyter Tutorial\",\n",
    "        \"PyViz Tutorial\",\n",
    "        \"Python Basics\",\n",
    "        \"Python Basics\",\n",
    "    ],\n",
    "    columns=[\"2021-12\", \"2022-01\", \"2022-02\"],\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "7c33d027",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>2021-12</th>\n",
       "      <th>2022-01</th>\n",
       "      <th>2022-02</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    2021-12  2022-01  2022-02\n",
       "13        2        2        2\n",
       "14        1        1        1\n",
       "16        2        2        2"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby(len).count()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2481357a",
   "metadata": {},
   "source": [
    "Mixing functions with arrays, dicts or series is no problem, as everything is converted internally into arrays:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "f9d15089",
   "metadata": {},
   "outputs": [],
   "source": [
    "languages = [\"de\", \"en\", \"de\", \"de\", \"en\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "2c3b8863",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>2021-12</th>\n",
       "      <th>2022-01</th>\n",
       "      <th>2022-02</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">13</th>\n",
       "      <th>de</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <th>de</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">16</th>\n",
       "      <th>de</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       2021-12  2022-01  2022-02\n",
       "13 de        1        1        1\n",
       "   en        1        1        1\n",
       "14 de        1        1        1\n",
       "16 de        1        1        1\n",
       "   en        1        1        1"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby([len, languages]).count()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9b58d0cc",
   "metadata": {},
   "source": [
    "## Grouping by index levels\n",
    "\n",
    "A final practical feature for hierarchically indexed datasets is the ability to aggregate by one of the index levels of an axis. Let’s look at an example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "66b260b3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>Month</th>\n",
       "      <th colspan=\"2\" halign=\"left\">2021-12</th>\n",
       "      <th colspan=\"2\" halign=\"left\">2022-01</th>\n",
       "      <th colspan=\"2\" halign=\"left\">2022-02</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>Version</th>\n",
       "      <th>latest</th>\n",
       "      <th>stable</th>\n",
       "      <th>latest</th>\n",
       "      <th>stable</th>\n",
       "      <th>latest</th>\n",
       "      <th>stable</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Jupyter Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>19651.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>30134.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>33295.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>4722.0</td>\n",
       "      <td>1825.0</td>\n",
       "      <td>3497.0</td>\n",
       "      <td>2576.0</td>\n",
       "      <td>4009.0</td>\n",
       "      <td>3707.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PyViz Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>2573.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>4873.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>3930.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>NaN</th>\n",
       "      <th>NaN</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Python Basics</th>\n",
       "      <th>de</th>\n",
       "      <td>525.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>427.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>276.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>157.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>85.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>226.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Month                 2021-12          2022-01          2022-02        \n",
       "Version                latest  stable   latest  stable   latest  stable\n",
       "Jupyter Tutorial de   19651.0     0.0  30134.0     0.0  33295.0     0.0\n",
       "                 en    4722.0  1825.0   3497.0  2576.0   4009.0  3707.0\n",
       "PyViz Tutorial   de    2573.0     0.0   4873.0     0.0   3930.0     0.0\n",
       "NaN              NaN      NaN     NaN      NaN     NaN      NaN     NaN\n",
       "Python Basics    de     525.0     0.0    427.0     0.0    276.0     0.0\n",
       "                 en     157.0     0.0     85.0     0.0    226.0     0.0"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "version_hits = [\n",
    "    [19651, 0, 30134, 0, 33295, 0],\n",
    "    [4722, 1825, 3497, 2576, 4009, 3707],\n",
    "    [2573, 0, 4873, 0, 3930, 0],\n",
    "    [None, None, None, None, None, None],\n",
    "    [525, 0, 427, 0, 276, 0],\n",
    "    [157, 0, 85, 0, 226, 0],\n",
    "]\n",
    "\n",
    "df = pd.DataFrame(\n",
    "    version_hits,\n",
    "    index=[\n",
    "        [\n",
    "            \"Jupyter Tutorial\",\n",
    "            \"Jupyter Tutorial\",\n",
    "            \"PyViz Tutorial\",\n",
    "            None,\n",
    "            \"Python Basics\",\n",
    "            \"Python Basics\",\n",
    "        ],\n",
    "        [\"de\", \"en\", \"de\", None, \"de\", \"en\"],\n",
    "    ],\n",
    "    columns=[\n",
    "        [\"2021-12\", \"2021-12\", \"2022-01\", \"2022-01\", \"2022-02\", \"2022-02\"],\n",
    "        [\"latest\", \"stable\", \"latest\", \"stable\", \"latest\", \"stable\"],\n",
    "    ],\n",
    ")\n",
    "\n",
    "df.columns.names = [\"Month\", \"Version\"]\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "311f84d8",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/var/folders/hk/s8m0bblj0g10hw885gld52mc0000gn/T/ipykernel_6958/3343480277.py:1: FutureWarning: DataFrame.groupby with axis=1 is deprecated. Do `frame.T.groupby(...)` without axis instead.\n",
      "  df.groupby(level=\"Month\", axis=1).sum()\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Month</th>\n",
       "      <th>2021-12</th>\n",
       "      <th>2022-01</th>\n",
       "      <th>2022-02</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Jupyter Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>19651.0</td>\n",
       "      <td>30134.0</td>\n",
       "      <td>33295.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>6547.0</td>\n",
       "      <td>6073.0</td>\n",
       "      <td>7716.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PyViz Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>2573.0</td>\n",
       "      <td>4873.0</td>\n",
       "      <td>3930.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>NaN</th>\n",
       "      <th>NaN</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Python Basics</th>\n",
       "      <th>de</th>\n",
       "      <td>525.0</td>\n",
       "      <td>427.0</td>\n",
       "      <td>276.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>157.0</td>\n",
       "      <td>85.0</td>\n",
       "      <td>226.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Month                 2021-12  2022-01  2022-02\n",
       "Jupyter Tutorial de   19651.0  30134.0  33295.0\n",
       "                 en    6547.0   6073.0   7716.0\n",
       "PyViz Tutorial   de    2573.0   4873.0   3930.0\n",
       "NaN              NaN      0.0      0.0      0.0\n",
       "Python Basics    de     525.0    427.0    276.0\n",
       "                 en     157.0     85.0    226.0"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby(level=\"Month\", axis=1).sum()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.13 Kernel",
   "language": "python",
   "name": "python313"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.0"
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}