{ "cells": [ { "cell_type": "markdown", "id": "6e8c0eb9", "metadata": {}, "source": [ "# Converting Python data structures into pandas\n", "\n", "Python data structures such as lists and arrays can be converted into pandas [Series](#Series) or [DataFrames](#DataFrame)." ] }, { "cell_type": "code", "execution_count": 1, "id": "c6ff8627", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "markdown", "id": "320bb0e4", "metadata": {}, "source": [ "## Series\n", "\n", "Python [lists](https://docs.python.org/3/tutorial/introduction.html#lists) can easily be converted into pandas Series:" ] }, { "cell_type": "code", "execution_count": 2, "id": "5af7277b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 -0.751442\n", "1 0.816935\n", "2 -0.272546\n", "3 -0.268295\n", "4 -0.296728\n", "5 0.176255\n", "6 -0.322612\n", "dtype: float64" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list1 = [-0.751442, 0.816935, -0.272546, -0.268295, -0.296728, 0.176255, -0.322612]\n", "\n", "pd.Series(list1)" ] }, { "cell_type": "markdown", "id": "16defc29", "metadata": {}, "source": [ "Multiple lists can also be easily converted into one pandas Series:" ] }, { "cell_type": "code", "execution_count": 3, "id": "af27f530", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 -0.751442\n", "1 0.816935\n", "2 -0.272546\n", "3 -0.268295\n", "4 -0.296728\n", "5 0.176255\n", "6 -0.322612\n", "7 -0.029608\n", "8 -0.277982\n", "9 2.693057\n", "10 -0.850817\n", "11 0.783868\n", "12 -1.137835\n", "13 -0.617132\n", "dtype: float64" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list2 = [-0.029608, -0.277982, 2.693057, -0.850817, 0.783868, -1.137835, -0.617132]\n", "\n", "pd.Series(list1 + list2)" ] }, { "cell_type": "markdown", "id": "625ed785", "metadata": {}, "source": [ "A list can also be passed as an index:" ] }, { "cell_type": "code", "execution_count": 4, "id": "04e9c84b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2022-01-31 -0.751442\n", "2022-02-01 0.816935\n", "2022-02-02 -0.272546\n", "2022-02-03 -0.268295\n", "2022-02-04 -0.296728\n", "2022-02-05 0.176255\n", "2022-02-06 -0.322612\n", "dtype: float64" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "date = [\n", " \"2022-01-31\",\n", " \"2022-02-01\",\n", " \"2022-02-02\",\n", " \"2022-02-03\",\n", " \"2022-02-04\",\n", " \"2022-02-05\",\n", " \"2022-02-06\",\n", "]\n", "\n", "pd.Series(list1, index=date)" ] }, { "cell_type": "markdown", "id": "3a79462a", "metadata": {}, "source": [ "With Python [dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) you can pass not only values but also the corresponding keys to a pandas series:" ] }, { "cell_type": "code", "execution_count": 5, "id": "c52a0388", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2022-01-31 -0.751442\n", "2022-02-01 0.816935\n", "2022-02-02 -0.272546\n", "2022-02-03 -0.268295\n", "2022-02-04 -0.296728\n", "2022-02-05 0.176255\n", "2022-02-06 -0.322612\n", "dtype: float64" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dict1 = {\n", " \"2022-01-31\": -0.751442,\n", " \"2022-02-01\": 0.816935,\n", " \"2022-02-02\": -0.272546,\n", " \"2022-02-03\": -0.268295,\n", " \"2022-02-04\": -0.296728,\n", " \"2022-02-05\": 0.176255,\n", " \"2022-02-06\": -0.322612,\n", "}\n", "\n", "pd.Series(dict1)" ] }, { "cell_type": "markdown", "id": "be680a77", "metadata": {}, "source": [ "When you pass a `dict`, the index in the resulting pandas series takes into account the order of the keys in the dict." ] }, { "cell_type": "markdown", "id": "844514b3", "metadata": {}, "source": [ "With [collections.ChainMap](https://docs.python.org/3/library/collections.html#collections.ChainMap) you can also turn several dicts into one pandas.Series.\n", "\n", "First we define a second dict:" ] }, { "cell_type": "code", "execution_count": 6, "id": "141373f1", "metadata": {}, "outputs": [], "source": [ "dict2 = {\n", " \"2022-02-07\": -0.029608,\n", " \"2022-02-08\": -0.277982,\n", " \"2022-02-09\": 2.693057,\n", " \"2022-02-10\": -0.850817,\n", " \"2022-02-11\": 0.783868,\n", " \"2022-02-12\": -1.137835,\n", " \"2022-02-13\": -0.617132,\n", "}" ] }, { "cell_type": "code", "execution_count": 7, "id": "063febca", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2022-02-07 -0.029608\n", "2022-02-08 -0.277982\n", "2022-02-09 2.693057\n", "2022-02-10 -0.850817\n", "2022-02-11 0.783868\n", "2022-02-12 -1.137835\n", "2022-02-13 -0.617132\n", "2022-01-31 -0.751442\n", "2022-02-01 0.816935\n", "2022-02-02 -0.272546\n", "2022-02-03 -0.268295\n", "2022-02-04 -0.296728\n", "2022-02-05 0.176255\n", "2022-02-06 -0.322612\n", "dtype: float64" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from collections import ChainMap\n", "\n", "\n", "pd.Series(ChainMap(dict1, dict2))" ] }, { "cell_type": "markdown", "id": "b04635b7", "metadata": {}, "source": [ "## DataFrame\n", "\n", "Lists of lists can be loaded into a pandas DataFrame with:" ] }, { "cell_type": "code", "execution_count": 8, "id": "802eca23", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0123456
0-0.7514420.816935-0.272546-0.268295-0.2967280.176255-0.322612
1-0.029608-0.2779822.693057-0.8508170.783868-1.137835-0.617132
\n", "
" ], "text/plain": [ " 0 1 2 3 4 5 6\n", "0 -0.751442 0.816935 -0.272546 -0.268295 -0.296728 0.176255 -0.322612\n", "1 -0.029608 -0.277982 2.693057 -0.850817 0.783868 -1.137835 -0.617132" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame([list1, list2])\n", "df" ] }, { "cell_type": "markdown", "id": "1d59c21a", "metadata": {}, "source": [ "You can also transfer a list into a DataFrame index:" ] }, { "cell_type": "code", "execution_count": 9, "id": "ff4cb961", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0123456
2022-01-31-0.7514420.816935-0.272546-0.268295-0.2967280.176255-0.322612
2022-02-01-0.029608-0.2779822.693057-0.8508170.783868-1.137835-0.617132
\n", "
" ], "text/plain": [ " 0 1 2 3 4 5 \\\n", "2022-01-31 -0.751442 0.816935 -0.272546 -0.268295 -0.296728 0.176255 \n", "2022-02-01 -0.029608 -0.277982 2.693057 -0.850817 0.783868 -1.137835 \n", "\n", " 6 \n", "2022-01-31 -0.322612 \n", "2022-02-01 -0.617132 " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame([list1, list2], index=[\"2022-01-31\", \"2022-02-01\"])" ] }, { "cell_type": "markdown", "id": "4095c761", "metadata": {}, "source": [ "A pandas DataFrame can be created from a dict with values in lists:" ] }, { "cell_type": "code", "execution_count": 10, "id": "848508f8", "metadata": {}, "outputs": [], "source": [ "data = {\n", " \"Code\": [\"U+0000\", \"U+0001\", \"U+0002\", \"U+0003\", \"U+0004\", \"U+0005\"],\n", " \"Decimal\": [0, 1, 2, 3, 4, 5],\n", " \"Octal\": [\"001\", \"002\", \"003\", \"004\", \"004\", \"005\"],\n", " \"Key\": [\"NUL\", \"Ctrl-A\", \"Ctrl-B\", \"Ctrl-C\", \"Ctrl-D\", \"Ctrl-E\"],\n", "}" ] }, { "cell_type": "code", "execution_count": 11, "id": "74115982", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CodeDecimalOctalKey
0U+00000001NUL
1U+00011002Ctrl-A
2U+00022003Ctrl-B
3U+00033004Ctrl-C
4U+00044004Ctrl-D
5U+00055005Ctrl-E
\n", "
" ], "text/plain": [ " Code Decimal Octal Key\n", "0 U+0000 0 001 NUL\n", "1 U+0001 1 002 Ctrl-A\n", "2 U+0002 2 003 Ctrl-B\n", "3 U+0003 3 004 Ctrl-C\n", "4 U+0004 4 004 Ctrl-D\n", "5 U+0005 5 005 Ctrl-E" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(data)" ] }, { "cell_type": "markdown", "id": "59ad9f0f", "metadata": {}, "source": [ "Another common form of data is nested dict of dicts:" ] }, { "cell_type": "code", "execution_count": 12, "id": "676ff717", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
U+0006U+0007
Decimal67
Octal006007
KeyCtrl-FCtrl-G
\n", "
" ], "text/plain": [ " U+0006 U+0007\n", "Decimal 6 7\n", "Octal 006 007\n", "Key Ctrl-F Ctrl-G" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data2 = {\n", " \"U+0006\": {\"Decimal\": \"6\", \"Octal\": \"006\", \"Key\": \"Ctrl-F\"},\n", " \"U+0007\": {\"Decimal\": \"7\", \"Octal\": \"007\", \"Key\": \"Ctrl-G\"},\n", "}\n", "\n", "df2 = pd.DataFrame(data2)\n", "\n", "df2" ] }, { "cell_type": "markdown", "id": "7796a09f", "metadata": {}, "source": [ "Dicts of Series are treated in a similar way:" ] }, { "cell_type": "code", "execution_count": 13, "id": "fc39c26b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
U+0006U+0007
KeyCtrl-FCtrl-G
\n", "
" ], "text/plain": [ " U+0006 U+0007\n", "Key Ctrl-F Ctrl-G" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data3 = {\"U+0006\": df2[\"U+0006\"][2:], \"U+0007\": df2[\"U+0007\"][2:]}\n", "\n", "pd.DataFrame(data3)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.13 Kernel", "language": "python", "name": "python313" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.0" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }