{
"cells": [
{
"cell_type": "markdown",
"id": "6e8c0eb9",
"metadata": {},
"source": [
"# Converting Python data structures into pandas\n",
"\n",
"Python data structures such as lists and arrays can be converted into pandas [Series](#Series) or [DataFrames](#DataFrame)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "c6ff8627",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"id": "320bb0e4",
"metadata": {},
"source": [
"## Series\n",
"\n",
"Python [lists](https://docs.python.org/3/tutorial/introduction.html#lists) can easily be converted into pandas Series:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "5af7277b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 -0.751442\n",
"1 0.816935\n",
"2 -0.272546\n",
"3 -0.268295\n",
"4 -0.296728\n",
"5 0.176255\n",
"6 -0.322612\n",
"dtype: float64"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"list1 = [-0.751442, 0.816935, -0.272546, -0.268295, -0.296728, 0.176255, -0.322612]\n",
"\n",
"pd.Series(list1)"
]
},
{
"cell_type": "markdown",
"id": "16defc29",
"metadata": {},
"source": [
"Multiple lists can also be easily converted into one pandas Series:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "af27f530",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 -0.751442\n",
"1 0.816935\n",
"2 -0.272546\n",
"3 -0.268295\n",
"4 -0.296728\n",
"5 0.176255\n",
"6 -0.322612\n",
"7 -0.029608\n",
"8 -0.277982\n",
"9 2.693057\n",
"10 -0.850817\n",
"11 0.783868\n",
"12 -1.137835\n",
"13 -0.617132\n",
"dtype: float64"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"list2 = [-0.029608, -0.277982, 2.693057, -0.850817, 0.783868, -1.137835, -0.617132]\n",
"\n",
"pd.Series(list1 + list2)"
]
},
{
"cell_type": "markdown",
"id": "625ed785",
"metadata": {},
"source": [
"A list can also be passed as an index:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "04e9c84b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2022-01-31 -0.751442\n",
"2022-02-01 0.816935\n",
"2022-02-02 -0.272546\n",
"2022-02-03 -0.268295\n",
"2022-02-04 -0.296728\n",
"2022-02-05 0.176255\n",
"2022-02-06 -0.322612\n",
"dtype: float64"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"date = [\n",
" \"2022-01-31\",\n",
" \"2022-02-01\",\n",
" \"2022-02-02\",\n",
" \"2022-02-03\",\n",
" \"2022-02-04\",\n",
" \"2022-02-05\",\n",
" \"2022-02-06\",\n",
"]\n",
"\n",
"pd.Series(list1, index=date)"
]
},
{
"cell_type": "markdown",
"id": "3a79462a",
"metadata": {},
"source": [
"With Python [dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) you can pass not only values but also the corresponding keys to a pandas series:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "c52a0388",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2022-01-31 -0.751442\n",
"2022-02-01 0.816935\n",
"2022-02-02 -0.272546\n",
"2022-02-03 -0.268295\n",
"2022-02-04 -0.296728\n",
"2022-02-05 0.176255\n",
"2022-02-06 -0.322612\n",
"dtype: float64"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dict1 = {\n",
" \"2022-01-31\": -0.751442,\n",
" \"2022-02-01\": 0.816935,\n",
" \"2022-02-02\": -0.272546,\n",
" \"2022-02-03\": -0.268295,\n",
" \"2022-02-04\": -0.296728,\n",
" \"2022-02-05\": 0.176255,\n",
" \"2022-02-06\": -0.322612,\n",
"}\n",
"\n",
"pd.Series(dict1)"
]
},
{
"cell_type": "markdown",
"id": "be680a77",
"metadata": {},
"source": [
"When you pass a `dict`, the index in the resulting pandas series takes into account the order of the keys in the dict."
]
},
{
"cell_type": "markdown",
"id": "844514b3",
"metadata": {},
"source": [
"With [collections.ChainMap](https://docs.python.org/3/library/collections.html#collections.ChainMap) you can also turn several dicts into one pandas.Series.\n",
"\n",
"First we define a second dict:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "141373f1",
"metadata": {},
"outputs": [],
"source": [
"dict2 = {\n",
" \"2022-02-07\": -0.029608,\n",
" \"2022-02-08\": -0.277982,\n",
" \"2022-02-09\": 2.693057,\n",
" \"2022-02-10\": -0.850817,\n",
" \"2022-02-11\": 0.783868,\n",
" \"2022-02-12\": -1.137835,\n",
" \"2022-02-13\": -0.617132,\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "063febca",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2022-02-07 -0.029608\n",
"2022-02-08 -0.277982\n",
"2022-02-09 2.693057\n",
"2022-02-10 -0.850817\n",
"2022-02-11 0.783868\n",
"2022-02-12 -1.137835\n",
"2022-02-13 -0.617132\n",
"2022-01-31 -0.751442\n",
"2022-02-01 0.816935\n",
"2022-02-02 -0.272546\n",
"2022-02-03 -0.268295\n",
"2022-02-04 -0.296728\n",
"2022-02-05 0.176255\n",
"2022-02-06 -0.322612\n",
"dtype: float64"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from collections import ChainMap\n",
"\n",
"\n",
"pd.Series(ChainMap(dict1, dict2))"
]
},
{
"cell_type": "markdown",
"id": "b04635b7",
"metadata": {},
"source": [
"## DataFrame\n",
"\n",
"Lists of lists can be loaded into a pandas DataFrame with:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "802eca23",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
" 3 | \n",
" 4 | \n",
" 5 | \n",
" 6 | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" -0.751442 | \n",
" 0.816935 | \n",
" -0.272546 | \n",
" -0.268295 | \n",
" -0.296728 | \n",
" 0.176255 | \n",
" -0.322612 | \n",
"
\n",
" \n",
" | 1 | \n",
" -0.029608 | \n",
" -0.277982 | \n",
" 2.693057 | \n",
" -0.850817 | \n",
" 0.783868 | \n",
" -1.137835 | \n",
" -0.617132 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 0 1 2 3 4 5 6\n",
"0 -0.751442 0.816935 -0.272546 -0.268295 -0.296728 0.176255 -0.322612\n",
"1 -0.029608 -0.277982 2.693057 -0.850817 0.783868 -1.137835 -0.617132"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.DataFrame([list1, list2])\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "1d59c21a",
"metadata": {},
"source": [
"You can also transfer a list into a DataFrame index:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "ff4cb961",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
" 3 | \n",
" 4 | \n",
" 5 | \n",
" 6 | \n",
"
\n",
" \n",
" \n",
" \n",
" | 2022-01-31 | \n",
" -0.751442 | \n",
" 0.816935 | \n",
" -0.272546 | \n",
" -0.268295 | \n",
" -0.296728 | \n",
" 0.176255 | \n",
" -0.322612 | \n",
"
\n",
" \n",
" | 2022-02-01 | \n",
" -0.029608 | \n",
" -0.277982 | \n",
" 2.693057 | \n",
" -0.850817 | \n",
" 0.783868 | \n",
" -1.137835 | \n",
" -0.617132 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 0 1 2 3 4 5 \\\n",
"2022-01-31 -0.751442 0.816935 -0.272546 -0.268295 -0.296728 0.176255 \n",
"2022-02-01 -0.029608 -0.277982 2.693057 -0.850817 0.783868 -1.137835 \n",
"\n",
" 6 \n",
"2022-01-31 -0.322612 \n",
"2022-02-01 -0.617132 "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.DataFrame([list1, list2], index=[\"2022-01-31\", \"2022-02-01\"])"
]
},
{
"cell_type": "markdown",
"id": "4095c761",
"metadata": {},
"source": [
"A pandas DataFrame can be created from a dict with values in lists:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "848508f8",
"metadata": {},
"outputs": [],
"source": [
"data = {\n",
" \"Code\": [\"U+0000\", \"U+0001\", \"U+0002\", \"U+0003\", \"U+0004\", \"U+0005\"],\n",
" \"Decimal\": [0, 1, 2, 3, 4, 5],\n",
" \"Octal\": [\"001\", \"002\", \"003\", \"004\", \"004\", \"005\"],\n",
" \"Key\": [\"NUL\", \"Ctrl-A\", \"Ctrl-B\", \"Ctrl-C\", \"Ctrl-D\", \"Ctrl-E\"],\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "74115982",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Code | \n",
" Decimal | \n",
" Octal | \n",
" Key | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" U+0000 | \n",
" 0 | \n",
" 001 | \n",
" NUL | \n",
"
\n",
" \n",
" | 1 | \n",
" U+0001 | \n",
" 1 | \n",
" 002 | \n",
" Ctrl-A | \n",
"
\n",
" \n",
" | 2 | \n",
" U+0002 | \n",
" 2 | \n",
" 003 | \n",
" Ctrl-B | \n",
"
\n",
" \n",
" | 3 | \n",
" U+0003 | \n",
" 3 | \n",
" 004 | \n",
" Ctrl-C | \n",
"
\n",
" \n",
" | 4 | \n",
" U+0004 | \n",
" 4 | \n",
" 004 | \n",
" Ctrl-D | \n",
"
\n",
" \n",
" | 5 | \n",
" U+0005 | \n",
" 5 | \n",
" 005 | \n",
" Ctrl-E | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Code Decimal Octal Key\n",
"0 U+0000 0 001 NUL\n",
"1 U+0001 1 002 Ctrl-A\n",
"2 U+0002 2 003 Ctrl-B\n",
"3 U+0003 3 004 Ctrl-C\n",
"4 U+0004 4 004 Ctrl-D\n",
"5 U+0005 5 005 Ctrl-E"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.DataFrame(data)"
]
},
{
"cell_type": "markdown",
"id": "59ad9f0f",
"metadata": {},
"source": [
"Another common form of data is nested dict of dicts:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "676ff717",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" U+0006 | \n",
" U+0007 | \n",
"
\n",
" \n",
" \n",
" \n",
" | Decimal | \n",
" 6 | \n",
" 7 | \n",
"
\n",
" \n",
" | Octal | \n",
" 006 | \n",
" 007 | \n",
"
\n",
" \n",
" | Key | \n",
" Ctrl-F | \n",
" Ctrl-G | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" U+0006 U+0007\n",
"Decimal 6 7\n",
"Octal 006 007\n",
"Key Ctrl-F Ctrl-G"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data2 = {\n",
" \"U+0006\": {\"Decimal\": \"6\", \"Octal\": \"006\", \"Key\": \"Ctrl-F\"},\n",
" \"U+0007\": {\"Decimal\": \"7\", \"Octal\": \"007\", \"Key\": \"Ctrl-G\"},\n",
"}\n",
"\n",
"df2 = pd.DataFrame(data2)\n",
"\n",
"df2"
]
},
{
"cell_type": "markdown",
"id": "7796a09f",
"metadata": {},
"source": [
"Dicts of Series are treated in a similar way:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "fc39c26b",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" U+0006 | \n",
" U+0007 | \n",
"
\n",
" \n",
" \n",
" \n",
" | Key | \n",
" Ctrl-F | \n",
" Ctrl-G | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" U+0006 U+0007\n",
"Key Ctrl-F Ctrl-G"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data3 = {\"U+0006\": df2[\"U+0006\"][2:], \"U+0007\": df2[\"U+0007\"][2:]}\n",
"\n",
"pd.DataFrame(data3)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.13 Kernel",
"language": "python",
"name": "python313"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.0"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"state": {},
"version_major": 2,
"version_minor": 0
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}