{ "cells": [ { "cell_type": "markdown", "id": "88b70951", "metadata": {}, "source": [ "# Pivot tables and crosstabs\n", "\n", "A [pivot table](https://en.wikipedia.org/wiki/Pivot_table) is a data summary tool often found in spreadsheet and other data analysis software. It summarises a table of data by one or more keys and arranges the data in a rectangle, with some of the group keys along the rows and some along the columns. Pivot tables in Python with pandas are made possible by the [groupby](group-operations.ipynb) function in combination with reshaping operations using [hierarchical indexing](indexing.ipynb#Hierarchical-Indexing). `DataFrame` has a [pivot_table](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot_table.html) method, and there is also a top-level function [pandas.pivot_table](https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html). `pivot_table` not only provides a convenient interface to `groupby`, but can also add partial sums (`margins`)." ] }, { "cell_type": "markdown", "id": "05583e3d", "metadata": {}, "source": [ "Suppose we wanted to compute a table of group averages (the default aggregation type of `pivot_table`) ordered by title and language in the rows:" ] }, { "cell_type": "code", "execution_count": 1, "id": "8194eb56", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "id": "a127be00-b808-44e0-8375-a162bd3dc192", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TitleLanguage2021-122022-012022-02
0Jupyter Tutorialde30134.033295.019651.0
1Jupyter Tutorialen6073.07716.06547.0
2PyViz Tutorialde4873.03930.02573.0
3PyViz TutorialenNaNNaNNaN
4Python Basicsde427.0276.0525.0
5Python Basicsen95.0226.0157.0
\n", "
" ], "text/plain": [ " Title Language 2021-12 2022-01 2022-02\n", "0 Jupyter Tutorial de 30134.0 33295.0 19651.0\n", "1 Jupyter Tutorial en 6073.0 7716.0 6547.0\n", "2 PyViz Tutorial de 4873.0 3930.0 2573.0\n", "3 PyViz Tutorial en NaN NaN NaN\n", "4 Python Basics de 427.0 276.0 525.0\n", "5 Python Basics en 95.0 226.0 157.0" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame(\n", " {\n", " \"Title\": [\n", " \"Jupyter Tutorial\",\n", " \"Jupyter Tutorial\",\n", " \"PyViz Tutorial\",\n", " \"PyViz Tutorial\",\n", " \"Python Basics\",\n", " \"Python Basics\",\n", " ],\n", " \"Language\": [\"de\", \"en\", \"de\", \"en\", \"de\", \"en\"],\n", " \"2021-12\": [30134, 6073, 4873, np.nan, 427, 95],\n", " \"2022-01\": [33295, 7716, 3930, np.nan, 276, 226],\n", " \"2022-02\": [19651, 6547, 2573, np.nan, 525, 157],\n", " }\n", ")\n", "\n", "df" ] }, { "cell_type": "code", "execution_count": 3, "id": "cabf0dcd", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
2021-122022-012022-02
TitleLanguage
Jupyter Tutorialde30134.033295.019651.0
en6073.07716.06547.0
PyViz Tutorialde4873.03930.02573.0
Python Basicsde427.0276.0525.0
en95.0226.0157.0
\n", "
" ], "text/plain": [ " 2021-12 2022-01 2022-02\n", "Title Language \n", "Jupyter Tutorial de 30134.0 33295.0 19651.0\n", " en 6073.0 7716.0 6547.0\n", "PyViz Tutorial de 4873.0 3930.0 2573.0\n", "Python Basics de 427.0 276.0 525.0\n", " en 95.0 226.0 157.0" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.pivot_table(index=[\"Title\", \"Language\"])" ] }, { "cell_type": "markdown", "id": "f902f560", "metadata": {}, "source": [ "This could also have been done directly with `groupby`." ] }, { "cell_type": "markdown", "id": "ee3dc439", "metadata": {}, "source": [ "Now let’s say we want to get the mean of hits of all languages per title for each individual month. For this I will enter `Title` in the table columns and the months in the rows:" ] }, { "cell_type": "code", "execution_count": 4, "id": "9fc1c0fe-071a-4ee8-9452-dad0b8ad0cb1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TitleJupyter TutorialPyViz TutorialPython Basics
Languagedeendedeen
2021-1230134.06073.04873.0427.095.0
2022-0133295.07716.03930.0276.0226.0
2022-0219651.06547.02573.0525.0157.0
\n", "
" ], "text/plain": [ "Title Jupyter Tutorial PyViz Tutorial Python Basics \n", "Language de en de de en\n", "2021-12 30134.0 6073.0 4873.0 427.0 95.0\n", "2022-01 33295.0 7716.0 3930.0 276.0 226.0\n", "2022-02 19651.0 6547.0 2573.0 525.0 157.0" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.pivot_table(columns=[\"Title\", \"Language\"])" ] }, { "cell_type": "markdown", "id": "39097086", "metadata": {}, "source": [ "To use an aggregation function other than `mean`, pass it to the keyword argument `aggfunc`. With `sum`, for example, you get the sum:" ] }, { "cell_type": "code", "execution_count": 5, "id": "9a74aa92", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TitleJupyter TutorialPyViz TutorialPython Basics
LanguagedeenAlldeenAlldeenAll
2021-1230134.06073.036207.04873.00.04873.0427.095.0522.0
2022-0133295.07716.041011.03930.00.03930.0276.0226.0502.0
2022-0219651.06547.026198.02573.00.02573.0525.0157.0682.0
\n", "
" ], "text/plain": [ "Title Jupyter Tutorial PyViz Tutorial \\\n", "Language de en All de en All \n", "2021-12 30134.0 6073.0 36207.0 4873.0 0.0 4873.0 \n", "2022-01 33295.0 7716.0 41011.0 3930.0 0.0 3930.0 \n", "2022-02 19651.0 6547.0 26198.0 2573.0 0.0 2573.0 \n", "\n", "Title Python Basics \n", "Language de en All \n", "2021-12 427.0 95.0 522.0 \n", "2022-01 276.0 226.0 502.0 \n", "2022-02 525.0 157.0 682.0 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.pivot_table(columns=[\"Title\", \"Language\"], aggfunc=\"sum\", margins=True)" ] }, { "cell_type": "markdown", "id": "0b25bb58", "metadata": {}, "source": [ "`pivot_table` options:\n", "\n", "Function name | Description\n", ":------------ | :-----------\n", "`values` | column name(s) to aggregate; by default, all numeric columns are aggregated\n", "`index` | column names or other group keys to be grouped in the rows of the resulting pivot table\n", "`columns` | column names or other group keys to be grouped in the columns of the resulting pivot table\n", "`aggfunc` | aggregation function or list of functions (by default `mean`); can be any function valid in a `groupby` context\n", "`fill_value` | replaces missing values in the result table\n", "`dropna` | if `True`, columns whose entries are all `NA` are ignored\n", "`margins` | inserts row/column subtotals and grand totals (default: `False`)\n", "`margins_name` | name used for row/column labels if `margins=True` is passed, default is `All`.\n", "`observed` | For categorical group keys, if `True`, only the observed category values are displayed in the keys and not all categories" ] }, { "cell_type": "markdown", "id": "4439584e", "metadata": {}, "source": [ "## Crosstabs\n", "\n", "A crosstab is a special case of a pivot table that calculates the frequency of groups. For example, in the context of an analysis of this data, we might want to determine which title was published in which language, so we could use `pivot_table` for this, but the function [pandas.crosstab](https://pandas.pydata.org/docs/reference/api/pandas.crosstab.html) is more convenient." ] }, { "cell_type": "code", "execution_count": 6, "id": "7fd8b688", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Languagedeen
Title
Jupyter Tutorial11
PyViz Tutorial11
Python Basics11
\n", "
" ], "text/plain": [ "Language de en\n", "Title \n", "Jupyter Tutorial 1 1\n", "PyViz Tutorial 1 1\n", "Python Basics 1 1" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(df.Title, df.Language)" ] }, { "cell_type": "markdown", "id": "8bfc96a5", "metadata": {}, "source": [ "The first two arguments for `crosstab` can each be either an array or a series or a list of arrays." ] }, { "cell_type": "markdown", "id": "540cc057", "metadata": {}, "source": [ "With `margins=True` we can also calculate the sums of the columns and rows as well as the total sum:" ] }, { "cell_type": "code", "execution_count": 7, "id": "f33cee62", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
LanguagedeenAll
Title
Jupyter Tutorial112
PyViz Tutorial112
Python Basics112
All336
\n", "
" ], "text/plain": [ "Language de en All\n", "Title \n", "Jupyter Tutorial 1 1 2\n", "PyViz Tutorial 1 1 2\n", "Python Basics 1 1 2\n", "All 3 3 6" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(df.Title, df.Language, margins=True)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.13 Kernel", "language": "python", "name": "python313" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.0" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }