{
"cells": [
{
"cell_type": "markdown",
"id": "5553e7a5",
"metadata": {},
"source": [
"# CSV example"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "06a5d3d1",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"id": "2b682a0d",
"metadata": {},
"source": [
"After importing pandas, we first read a csv file with `read_csv`:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "de19c7bd",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Python basics | \n",
" en | \n",
" Veit Schiele | \n",
" BSD-3-Clause | \n",
" 2021-10-28 | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" Jupyter Tutorial | \n",
" en | \n",
" Veit Schiele | \n",
" BSD-3-Clause | \n",
" 2019-06-27 | \n",
"
\n",
" \n",
" | 1 | \n",
" Jupyter Tutorial | \n",
" de | \n",
" Veit Schiele | \n",
" BSD-3-Clause | \n",
" 2020-10-26 | \n",
"
\n",
" \n",
" | 2 | \n",
" PyViz Tutorial | \n",
" en | \n",
" Veit Schiele | \n",
" BSD-3-Clause | \n",
" 2020-04-13 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Python basics en Veit Schiele BSD-3-Clause 2021-10-28\n",
"0 Jupyter Tutorial en Veit Schiele BSD-3-Clause 2019-06-27\n",
"1 Jupyter Tutorial de Veit Schiele BSD-3-Clause 2020-10-26\n",
"2 PyViz Tutorial en Veit Schiele BSD-3-Clause 2020-04-13"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_csv(\n",
" \"https://raw.githubusercontent.com/veit/python-basics-tutorial-de/main/docs/save-data/books.csv\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "6c87274f",
"metadata": {},
"source": [
"As you can see, this file has no header. To give the DataFrame a header, you have several options. You can allow pandas to assign default column names, or you can define the names yourself:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a46a20c2",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
" 3 | \n",
" 4 | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" Python basics | \n",
" en | \n",
" Veit Schiele | \n",
" BSD-3-Clause | \n",
" 2021-10-28 | \n",
"
\n",
" \n",
" | 1 | \n",
" Jupyter Tutorial | \n",
" en | \n",
" Veit Schiele | \n",
" BSD-3-Clause | \n",
" 2019-06-27 | \n",
"
\n",
" \n",
" | 2 | \n",
" Jupyter Tutorial | \n",
" de | \n",
" Veit Schiele | \n",
" BSD-3-Clause | \n",
" 2020-10-26 | \n",
"
\n",
" \n",
" | 3 | \n",
" PyViz Tutorial | \n",
" en | \n",
" Veit Schiele | \n",
" BSD-3-Clause | \n",
" 2020-04-13 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 0 1 2 3 4\n",
"0 Python basics en Veit Schiele BSD-3-Clause 2021-10-28\n",
"1 Jupyter Tutorial en Veit Schiele BSD-3-Clause 2019-06-27\n",
"2 Jupyter Tutorial de Veit Schiele BSD-3-Clause 2020-10-26\n",
"3 PyViz Tutorial en Veit Schiele BSD-3-Clause 2020-04-13"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_csv(\n",
" \"https://raw.githubusercontent.com/veit/python-basics-tutorial-de/main/docs/save-data/books.csv\",\n",
" header=None,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "46b04f42",
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Title | \n",
" Language | \n",
" Authors | \n",
" License | \n",
" Publication date | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" Python basics | \n",
" en | \n",
" Veit Schiele | \n",
" BSD-3-Clause | \n",
" 2021-10-28 | \n",
"
\n",
" \n",
" | 1 | \n",
" Jupyter Tutorial | \n",
" en | \n",
" Veit Schiele | \n",
" BSD-3-Clause | \n",
" 2019-06-27 | \n",
"
\n",
" \n",
" | 2 | \n",
" Jupyter Tutorial | \n",
" de | \n",
" Veit Schiele | \n",
" BSD-3-Clause | \n",
" 2020-10-26 | \n",
"
\n",
" \n",
" | 3 | \n",
" PyViz Tutorial | \n",
" en | \n",
" Veit Schiele | \n",
" BSD-3-Clause | \n",
" 2020-04-13 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Title Language Authors License Publication date\n",
"0 Python basics en Veit Schiele BSD-3-Clause 2021-10-28\n",
"1 Jupyter Tutorial en Veit Schiele BSD-3-Clause 2019-06-27\n",
"2 Jupyter Tutorial de Veit Schiele BSD-3-Clause 2020-10-26\n",
"3 PyViz Tutorial en Veit Schiele BSD-3-Clause 2020-04-13"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_csv(\n",
" \"https://raw.githubusercontent.com/veit/python-basics-tutorial-de/main/docs/save-data/books.csv\",\n",
" names=[\"Title\", \"Language\", \"Authors\", \"License\", \"Publication date\"],\n",
")"
]
},
{
"cell_type": "markdown",
"id": "d50206d4",
"metadata": {},
"source": [
"Suppose you want the `Authors` column to be the index of the returned DataFrame. You can either specify that you want the column at index 3 or with the name `Authors` by using the argument `index_col`:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "15179ece",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Title | \n",
" Language | \n",
" License | \n",
" Publication date | \n",
"
\n",
" \n",
" | Authors | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | Veit Schiele | \n",
" Python basics | \n",
" en | \n",
" BSD-3-Clause | \n",
" 2021-10-28 | \n",
"
\n",
" \n",
" | Veit Schiele | \n",
" Jupyter Tutorial | \n",
" en | \n",
" BSD-3-Clause | \n",
" 2019-06-27 | \n",
"
\n",
" \n",
" | Veit Schiele | \n",
" Jupyter Tutorial | \n",
" de | \n",
" BSD-3-Clause | \n",
" 2020-10-26 | \n",
"
\n",
" \n",
" | Veit Schiele | \n",
" PyViz Tutorial | \n",
" en | \n",
" BSD-3-Clause | \n",
" 2020-04-13 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Title Language License Publication date\n",
"Authors \n",
"Veit Schiele Python basics en BSD-3-Clause 2021-10-28\n",
"Veit Schiele Jupyter Tutorial en BSD-3-Clause 2019-06-27\n",
"Veit Schiele Jupyter Tutorial de BSD-3-Clause 2020-10-26\n",
"Veit Schiele PyViz Tutorial en BSD-3-Clause 2020-04-13"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_csv(\n",
" \"https://raw.githubusercontent.com/veit/python-basics-tutorial-de/main/docs/save-data/books.csv\",\n",
" index_col=[\"Authors\"],\n",
" names=[\"Title\", \"Language\", \"Authors\", \"License\", \"Publication date\"],\n",
")"
]
},
{
"cell_type": "markdown",
"id": "696f87c9",
"metadata": {},
"source": [
"In case you want to build a hierarchical index from several columns, pass a list of column numbers or names:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "fd3e9130",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" | \n",
" Language | \n",
" License | \n",
" Publication date | \n",
"
\n",
" \n",
" | Authors | \n",
" Title | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | Veit Schiele | \n",
" Python basics | \n",
" en | \n",
" BSD-3-Clause | \n",
" 2021-10-28 | \n",
"
\n",
" \n",
" | Jupyter Tutorial | \n",
" en | \n",
" BSD-3-Clause | \n",
" 2019-06-27 | \n",
"
\n",
" \n",
" | Jupyter Tutorial | \n",
" de | \n",
" BSD-3-Clause | \n",
" 2020-10-26 | \n",
"
\n",
" \n",
" | PyViz Tutorial | \n",
" en | \n",
" BSD-3-Clause | \n",
" 2020-04-13 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Language License Publication date\n",
"Authors Title \n",
"Veit Schiele Python basics en BSD-3-Clause 2021-10-28\n",
" Jupyter Tutorial en BSD-3-Clause 2019-06-27\n",
" Jupyter Tutorial de BSD-3-Clause 2020-10-26\n",
" PyViz Tutorial en BSD-3-Clause 2020-04-13"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_csv(\n",
" \"https://raw.githubusercontent.com/veit/python-basics-tutorial-de/main/docs/save-data/books.csv\",\n",
" index_col=[2, 0],\n",
" names=[\"Title\", \"Language\", \"Authors\", \"License\", \"Publication date\"],\n",
")"
]
},
{
"cell_type": "markdown",
"id": "3d5b438d",
"metadata": {},
"source": [
"In some cases, a table does not have a fixed separator, but uses several spaces or some other pattern to separate fields. Suppose a file looks like this:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "47d8f436",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[' Title Language Authors License Publication date\\n',\n",
" '1 Python basics en Veit Schiele BSD-3-Clause 2021-10-28\\n',\n",
" '2 Jupyter Tutorial en Veit Schiele BSD-3-Clause 2019-06-27\\n',\n",
" '3 Jupyter Tutorial de Veit Schiele BSD-3-Clause 2020-10-26\\n',\n",
" '4 PyViz Tutorial en Veit Schiele BSD-3-Clause 2020-04-13\\n']"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"list(open(\"books.txt\"))"
]
},
{
"cell_type": "markdown",
"id": "1b46eb37",
"metadata": {},
"source": [
"In such cases, you can pass a regular expression as a separator for `read_csv`. This can be expressed by the regular expression `\\s\\s+`, so then we have:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "2fa3eb87",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Title | \n",
" Language | \n",
" Authors | \n",
" License | \n",
" Publication date | \n",
"
\n",
" \n",
" \n",
" \n",
" | 1 | \n",
" Python basics | \n",
" en | \n",
" Veit Schiele | \n",
" BSD-3-Clause | \n",
" 2021-10-28 | \n",
"
\n",
" \n",
" | 2 | \n",
" Jupyter Tutorial | \n",
" en | \n",
" Veit Schiele | \n",
" BSD-3-Clause | \n",
" 2019-06-27 | \n",
"
\n",
" \n",
" | 3 | \n",
" Jupyter Tutorial | \n",
" de | \n",
" Veit Schiele | \n",
" BSD-3-Clause | \n",
" 2020-10-26 | \n",
"
\n",
" \n",
" | 4 | \n",
" PyViz Tutorial | \n",
" en | \n",
" Veit Schiele | \n",
" BSD-3-Clause | \n",
" 2020-04-13 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Title Language Authors License Publication date\n",
"1 Python basics en Veit Schiele BSD-3-Clause 2021-10-28\n",
"2 Jupyter Tutorial en Veit Schiele BSD-3-Clause 2019-06-27\n",
"3 Jupyter Tutorial de Veit Schiele BSD-3-Clause 2020-10-26\n",
"4 PyViz Tutorial en Veit Schiele BSD-3-Clause 2020-04-13"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_csv(\"books.txt\", sep=r\"\\s\\s+\", engine=\"python\")"
]
},
{
"cell_type": "markdown",
"id": "41021e4d",
"metadata": {},
"source": [
"Since there was one column name less than the number of data rows, `read_csv` infers that in this case the first column should be the index of the DataFrame."
]
},
{
"cell_type": "markdown",
"id": "8cef6718",
"metadata": {},
"source": [
"The parser functions have many additional arguments that help you handle the wide variety of exception file formats that occur. For example, you can skip individual lines of a file with `skiprows`:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "1f849c65",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Title | \n",
" Language | \n",
" Authors | \n",
" License | \n",
" Publication date | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" Python basics | \n",
" en | \n",
" Veit Schiele | \n",
" BSD-3-Clause | \n",
" 2021-10-28 | \n",
"
\n",
" \n",
" | 1 | \n",
" Jupyter Tutorial | \n",
" en | \n",
" Veit Schiele | \n",
" BSD-3-Clause | \n",
" 2019-06-27 | \n",
"
\n",
" \n",
" | 2 | \n",
" PyViz Tutorial | \n",
" en | \n",
" Veit Schiele | \n",
" BSD-3-Clause | \n",
" 2020-04-13 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Title Language Authors License Publication date\n",
"0 Python basics en Veit Schiele BSD-3-Clause 2021-10-28\n",
"1 Jupyter Tutorial en Veit Schiele BSD-3-Clause 2019-06-27\n",
"2 PyViz Tutorial en Veit Schiele BSD-3-Clause 2020-04-13"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_csv(\n",
" \"https://raw.githubusercontent.com/veit/python-basics-tutorial-de/main/docs/save-data/books.csv\",\n",
" skiprows=[2],\n",
" names=[\"Title\", \"Language\", \"Authors\", \"License\", \"Publication date\"],\n",
")"
]
},
{
"cell_type": "markdown",
"id": "8163765c",
"metadata": {},
"source": [
"Dealing with missing values is an important and often complicated part of parsing data. Missing data is usually either not present (empty string) or indicated by a placeholder. By default, pandas uses a number of common placeholders, such as `NA` and `NULL`:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "b7060c22",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Title | \n",
" Language | \n",
" Authors | \n",
" License | \n",
" Publication date | \n",
" doi | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" Python basics | \n",
" en | \n",
" Veit Schiele | \n",
" BSD-3-Clause | \n",
" 2021-10-28 | \n",
" NaN | \n",
"
\n",
" \n",
" | 1 | \n",
" Jupyter Tutorial | \n",
" en | \n",
" Veit Schiele | \n",
" BSD-3-Clause | \n",
" 2019-06-27 | \n",
" NaN | \n",
"
\n",
" \n",
" | 2 | \n",
" Jupyter Tutorial | \n",
" de | \n",
" Veit Schiele | \n",
" BSD-3-Clause | \n",
" 2020-10-26 | \n",
" NaN | \n",
"
\n",
" \n",
" | 3 | \n",
" PyViz Tutorial | \n",
" en | \n",
" Veit Schiele | \n",
" BSD-3-Clause | \n",
" 2020-04-13 | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Title Language Authors License Publication date doi\n",
"0 Python basics en Veit Schiele BSD-3-Clause 2021-10-28 NaN\n",
"1 Jupyter Tutorial en Veit Schiele BSD-3-Clause 2019-06-27 NaN\n",
"2 Jupyter Tutorial de Veit Schiele BSD-3-Clause 2020-10-26 NaN\n",
"3 PyViz Tutorial en Veit Schiele BSD-3-Clause 2020-04-13 NaN"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_csv(\n",
" \"https://raw.githubusercontent.com/veit/python-basics-tutorial-de/main/docs/save-data/books.csv\",\n",
" names=[\n",
" \"Title\",\n",
" \"Language\",\n",
" \"Authors\",\n",
" \"License\",\n",
" \"Publication date\",\n",
" \"doi\",\n",
" ],\n",
")\n",
"\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "0b3dc1bc",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Title | \n",
" Language | \n",
" Authors | \n",
" License | \n",
" Publication date | \n",
" doi | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
" True | \n",
"
\n",
" \n",
" | 1 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
" True | \n",
"
\n",
" \n",
" | 2 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
" True | \n",
"
\n",
" \n",
" | 3 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
" True | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Title Language Authors License Publication date doi\n",
"0 False False False False False True\n",
"1 False False False False False True\n",
"2 False False False False False True\n",
"3 False False False False False True"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.isna()"
]
},
{
"cell_type": "markdown",
"id": "ec5602d1",
"metadata": {},
"source": [
"The `na_values` option can take either a list or a series of strings to account for missing values:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "eb355d44",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Title | \n",
" Language | \n",
" Authors | \n",
" License | \n",
" Publication date | \n",
" doi | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" Python basics | \n",
" en | \n",
" Veit Schiele | \n",
" NaN | \n",
" 2021-10-28 | \n",
" NaN | \n",
"
\n",
" \n",
" | 1 | \n",
" Jupyter Tutorial | \n",
" en | \n",
" Veit Schiele | \n",
" NaN | \n",
" 2019-06-27 | \n",
" NaN | \n",
"
\n",
" \n",
" | 2 | \n",
" Jupyter Tutorial | \n",
" de | \n",
" Veit Schiele | \n",
" NaN | \n",
" 2020-10-26 | \n",
" NaN | \n",
"
\n",
" \n",
" | 3 | \n",
" PyViz Tutorial | \n",
" en | \n",
" Veit Schiele | \n",
" NaN | \n",
" 2020-04-13 | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Title Language Authors License Publication date doi\n",
"0 Python basics en Veit Schiele NaN 2021-10-28 NaN\n",
"1 Jupyter Tutorial en Veit Schiele NaN 2019-06-27 NaN\n",
"2 Jupyter Tutorial de Veit Schiele NaN 2020-10-26 NaN\n",
"3 PyViz Tutorial en Veit Schiele NaN 2020-04-13 NaN"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_csv(\n",
" \"https://raw.githubusercontent.com/veit/python-basics-tutorial-de/main/docs/save-data/books.csv\",\n",
" na_values=[\"BSD-3-Clause\"],\n",
" names=[\n",
" \"Title\",\n",
" \"Language\",\n",
" \"Authors\",\n",
" \"License\",\n",
" \"Publication date\",\n",
" \"doi\",\n",
" ],\n",
")"
]
},
{
"cell_type": "markdown",
"id": "47c9eecb",
"metadata": {},
"source": [
"The most frequent arguments of the function `read_csv`:\n",
"\n",
"Argument | Description\n",
":------- | :----------\n",
"`path` | String specifying the location in the file system, a URL or a file-like object\n",
"`sep` or `delimiter` | String or regular expression to separate the fields in each row\n",
"`header` | Row number to be used as column name; default is `0`, i.e. the first row, but should be `None` if there is no header row\n",
"`index_col` | Row numbers or names to be used as row index in the result; can be a single name/number or a list of them for a hierarchical index\n",
"`names` | List of column names\n",
"`skiprows` | Number of rows to be ignored at the beginning of the file or list of row numbers starting at `0` to be skipped\n",
"`na_values` | sequence of values to be replaced by NA\n",
"`comment` | character to separate comments from the end of the line\n",
"`parse_dates` | Attempt to parse data with datetime; defaults to `False`. If `True`, attempts to parse all columns. Otherwise, a list of column numbers or names to parse can be specified. If the list element is a tuple or a list, multiple columns are combined and converted to a date, for example if the date and time are split between two columns\n",
"`keep_date_col` | if columns are combined to parse the date, the combined columns are kept; default: `False`\n",
"`converters` | Dict containing the column number of names mapped to functions, for example `{'Titel': f}` would apply the function f to all values in the column `Title`\n",
"`dayfirst` | treat as an international format when parsing potentially ambiguous dates, for example `28/6/2021` → `28. Juni 2021`; `False` by default\n",
"`date_parser` | function to use for parsing dates\n",
"`nrows` | Number of lines to read from the beginning of the file.\n",
"`iterator` | Return a `TextFileReader` object to read the file piece by piece; this object can also be used with the `with` statement\n",
"`chunksize` | For the iteration, the size of the data blocks.\n",
"`skip_footer` | number of lines to be ignored at the end of the file\n",
"`verbose` | outputs various information about the parser output, for example the number of missing values in non-numeric columns\n",
"`encoding` | Text encoding for Unicode, for example `utf-8` for UTF-8 encoded text\n",
"`squeeze` | if the parsed data contains only one column, a Series is returned\n",
"`thousands` | Separator for thousands, for example `,` or `.`"
]
},
{
"cell_type": "markdown",
"id": "951388e8",
"metadata": {},
"source": [
"## Reading in text files piece by piece\n",
"\n",
"If you want to process very large files, you can also read in only a small part of a file or iterate through smaller parts of a file.\n",
"\n",
"Before we look at a large file, we reduce the number of lines displayed with `options.display.max_rows`:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "e61b2021",
"metadata": {},
"outputs": [],
"source": [
"pd.options.display.max_rows = 10"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "29ff1efa",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Date | \n",
" Mon. | \n",
" Tues. | \n",
" Wed. | \n",
" Thurs. | \n",
" Fri. | \n",
" Sat. | \n",
" Sun. | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 1996-01-01 | \n",
" 0.129453 | \n",
" -0.023836 | \n",
" 1.121460 | \n",
" 1.698286 | \n",
" -0.598506 | \n",
" 1.042221 | \n",
" -0.726412 | \n",
"
\n",
" \n",
" | 1 | \n",
" 1996-01-02 | \n",
" -0.094021 | \n",
" -0.727942 | \n",
" 0.698641 | \n",
" -1.198040 | \n",
" 1.927505 | \n",
" 1.147445 | \n",
" -1.134103 | \n",
"
\n",
" \n",
" | 2 | \n",
" 1996-01-03 | \n",
" -0.560857 | \n",
" 0.145222 | \n",
" -0.990202 | \n",
" 1.200214 | \n",
" 0.717339 | \n",
" 1.117095 | \n",
" -1.793565 | \n",
"
\n",
" \n",
" | 3 | \n",
" 1996-01-04 | \n",
" -0.169755 | \n",
" -0.677391 | \n",
" -1.533519 | \n",
" -0.343477 | \n",
" -0.109705 | \n",
" 1.038236 | \n",
" -0.799088 | \n",
"
\n",
" \n",
" | 4 | \n",
" 1996-01-05 | \n",
" 1.344705 | \n",
" -1.817261 | \n",
" 0.460991 | \n",
" -0.839633 | \n",
" 0.265814 | \n",
" 0.477659 | \n",
" 0.636383 | \n",
"
\n",
" \n",
" | ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" | 9127 | \n",
" 2020-12-27 | \n",
" -0.881800 | \n",
" -0.074270 | \n",
" -0.351769 | \n",
" 1.381641 | \n",
" -0.049548 | \n",
" 1.664180 | \n",
" -1.032204 | \n",
"
\n",
" \n",
" | 9128 | \n",
" 2020-12-28 | \n",
" -0.143386 | \n",
" 0.198217 | \n",
" -1.243861 | \n",
" 1.196576 | \n",
" 1.338166 | \n",
" -0.212333 | \n",
" -0.023131 | \n",
"
\n",
" \n",
" | 9129 | \n",
" 2020-12-29 | \n",
" 0.398787 | \n",
" -0.848786 | \n",
" 1.791707 | \n",
" -1.167592 | \n",
" -0.033881 | \n",
" -0.285559 | \n",
" -0.323477 | \n",
"
\n",
" \n",
" | 9130 | \n",
" 2020-12-30 | \n",
" 0.587846 | \n",
" 0.411580 | \n",
" 1.150380 | \n",
" 0.444638 | \n",
" -1.093577 | \n",
" 0.605456 | \n",
" 1.463345 | \n",
"
\n",
" \n",
" | 9131 | \n",
" 2020-12-31 | \n",
" 0.736350 | \n",
" 0.436292 | \n",
" -0.260171 | \n",
" -0.066066 | \n",
" -0.328324 | \n",
" -0.586792 | \n",
" -1.204582 | \n",
"
\n",
" \n",
"
\n",
"
9132 rows × 8 columns
\n",
"
"
],
"text/plain": [
" Date Mon. Tues. Wed. Thurs. Fri. Sat. \\\n",
"0 1996-01-01 0.129453 -0.023836 1.121460 1.698286 -0.598506 1.042221 \n",
"1 1996-01-02 -0.094021 -0.727942 0.698641 -1.198040 1.927505 1.147445 \n",
"2 1996-01-03 -0.560857 0.145222 -0.990202 1.200214 0.717339 1.117095 \n",
"3 1996-01-04 -0.169755 -0.677391 -1.533519 -0.343477 -0.109705 1.038236 \n",
"4 1996-01-05 1.344705 -1.817261 0.460991 -0.839633 0.265814 0.477659 \n",
"... ... ... ... ... ... ... ... \n",
"9127 2020-12-27 -0.881800 -0.074270 -0.351769 1.381641 -0.049548 1.664180 \n",
"9128 2020-12-28 -0.143386 0.198217 -1.243861 1.196576 1.338166 -0.212333 \n",
"9129 2020-12-29 0.398787 -0.848786 1.791707 -1.167592 -0.033881 -0.285559 \n",
"9130 2020-12-30 0.587846 0.411580 1.150380 0.444638 -1.093577 0.605456 \n",
"9131 2020-12-31 0.736350 0.436292 -0.260171 -0.066066 -0.328324 -0.586792 \n",
"\n",
" Sun. \n",
"0 -0.726412 \n",
"1 -1.134103 \n",
"2 -1.793565 \n",
"3 -0.799088 \n",
"4 0.636383 \n",
"... ... \n",
"9127 -1.032204 \n",
"9128 -0.023131 \n",
"9129 -0.323477 \n",
"9130 1.463345 \n",
"9131 -1.204582 \n",
"\n",
"[9132 rows x 8 columns]"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_csv(\"example.csv\")"
]
},
{
"cell_type": "markdown",
"id": "64e596d1",
"metadata": {},
"source": [
"If you only want to read a small number of lines (without reading the whole file), you can specify this with `nrows`:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "01856e17",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Date | \n",
" Mon. | \n",
" Tues. | \n",
" Wed. | \n",
" Thurs. | \n",
" Fri. | \n",
" Sat. | \n",
" Sun. | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 1996-01-01 | \n",
" 0.129453 | \n",
" -0.023836 | \n",
" 1.121460 | \n",
" 1.698286 | \n",
" -0.598506 | \n",
" 1.042221 | \n",
" -0.726412 | \n",
"
\n",
" \n",
" | 1 | \n",
" 1996-01-02 | \n",
" -0.094021 | \n",
" -0.727942 | \n",
" 0.698641 | \n",
" -1.198040 | \n",
" 1.927505 | \n",
" 1.147445 | \n",
" -1.134103 | \n",
"
\n",
" \n",
" | 2 | \n",
" 1996-01-03 | \n",
" -0.560857 | \n",
" 0.145222 | \n",
" -0.990202 | \n",
" 1.200214 | \n",
" 0.717339 | \n",
" 1.117095 | \n",
" -1.793565 | \n",
"
\n",
" \n",
" | 3 | \n",
" 1996-01-04 | \n",
" -0.169755 | \n",
" -0.677391 | \n",
" -1.533519 | \n",
" -0.343477 | \n",
" -0.109705 | \n",
" 1.038236 | \n",
" -0.799088 | \n",
"
\n",
" \n",
" | 4 | \n",
" 1996-01-05 | \n",
" 1.344705 | \n",
" -1.817261 | \n",
" 0.460991 | \n",
" -0.839633 | \n",
" 0.265814 | \n",
" 0.477659 | \n",
" 0.636383 | \n",
"
\n",
" \n",
" | 5 | \n",
" 1996-01-06 | \n",
" -0.354445 | \n",
" -0.065182 | \n",
" -1.244963 | \n",
" -0.559732 | \n",
" 0.042362 | \n",
" -0.303712 | \n",
" 0.067632 | \n",
"
\n",
" \n",
" | 6 | \n",
" 1996-01-07 | \n",
" 1.460922 | \n",
" 0.164412 | \n",
" 0.883960 | \n",
" -0.833642 | \n",
" 0.001582 | \n",
" 1.138469 | \n",
" 0.561618 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Date Mon. Tues. Wed. Thurs. Fri. Sat. \\\n",
"0 1996-01-01 0.129453 -0.023836 1.121460 1.698286 -0.598506 1.042221 \n",
"1 1996-01-02 -0.094021 -0.727942 0.698641 -1.198040 1.927505 1.147445 \n",
"2 1996-01-03 -0.560857 0.145222 -0.990202 1.200214 0.717339 1.117095 \n",
"3 1996-01-04 -0.169755 -0.677391 -1.533519 -0.343477 -0.109705 1.038236 \n",
"4 1996-01-05 1.344705 -1.817261 0.460991 -0.839633 0.265814 0.477659 \n",
"5 1996-01-06 -0.354445 -0.065182 -1.244963 -0.559732 0.042362 -0.303712 \n",
"6 1996-01-07 1.460922 0.164412 0.883960 -0.833642 0.001582 1.138469 \n",
"\n",
" Sun. \n",
"0 -0.726412 \n",
"1 -1.134103 \n",
"2 -1.793565 \n",
"3 -0.799088 \n",
"4 0.636383 \n",
"5 0.067632 \n",
"6 0.561618 "
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_csv(\"example.csv\", nrows=7)"
]
},
{
"cell_type": "markdown",
"id": "124f2a6a",
"metadata": {},
"source": [
"To read a file piece by piece, you can specify the number of lines with `chunksize`:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "ce309f8c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_csv(\"example.csv\", chunksize=1000)"
]
},
{
"cell_type": "markdown",
"id": "2c682a02",
"metadata": {},
"source": [
"The `TextFileReader` object returned by `read_csv` allows iteration over parts of the file according to the `chunksize`. For example, we can iterate over the `example.csv` file and aggregate the number of values in the `Date` column as follows:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "c11aa475",
"metadata": {},
"outputs": [],
"source": [
"chunks = pd.read_csv(\"example.csv\", chunksize=1000)\n",
"\n",
"serie = pd.Series([], dtype=\"float64\")\n",
"for chunk in chunks:\n",
" values = serie.add(chunk[\"Date\"].value_counts(), fill_value=0)\n",
"\n",
"sorted_values = values.sort_values(ascending=False)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "59657986",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Date\n",
"2020-08-22 1.0\n",
"2020-11-13 1.0\n",
"2020-11-27 1.0\n",
"2020-11-26 1.0\n",
"2020-11-25 1.0\n",
"2020-11-24 1.0\n",
"2020-11-23 1.0\n",
"2020-11-22 1.0\n",
"2020-11-21 1.0\n",
"2020-11-20 1.0\n",
"dtype: float64"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sorted_values[:10]"
]
},
{
"cell_type": "markdown",
"id": "915c3c62",
"metadata": {},
"source": [
"`TextFileReader` also has a `get_chunk` method that allows you to read pieces of any size."
]
},
{
"cell_type": "markdown",
"id": "d48072ce",
"metadata": {},
"source": [
"## Write DataFrame and Series as a CSV file\n",
"\n",
"Data can also be exported in a comma-separated format. With the method `pandas.DataFrame.to_csv` we can write the data into a comma-separated file:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "d88a8271",
"metadata": {},
"outputs": [],
"source": [
"df.to_csv(\"out.csv\")"
]
},
{
"cell_type": "markdown",
"id": "b41a72b7",
"metadata": {},
"source": [
"Of course, other delimiters can also be used, for example to write to `sys.stdout`, so that the text result is output on the console and not in a file:"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "ec24d29c",
"metadata": {},
"outputs": [],
"source": [
"import sys"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "f1271ae2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"|Title|Language|Authors|License|Publication date|doi\n",
"0|Python basics|en|Veit Schiele|BSD-3-Clause|2021-10-28|\n",
"1|Jupyter Tutorial|en|Veit Schiele|BSD-3-Clause|2019-06-27|\n",
"2|Jupyter Tutorial|de|Veit Schiele|BSD-3-Clause|2020-10-26|\n",
"3|PyViz Tutorial|en|Veit Schiele|BSD-3-Clause|2020-04-13|\n"
]
}
],
"source": [
"df.to_csv(sys.stdout, sep=\"|\")"
]
},
{
"cell_type": "markdown",
"id": "2edd8f82",
"metadata": {},
"source": [
"Missing values appear in the output as empty strings. You may want to mark them with a different placeholder:"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "455145fe",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
",Title,Language,Authors,License,Publication date,doi\n",
"0,Python basics,en,Veit Schiele,BSD-3-Clause,2021-10-28,NaN\n",
"1,Jupyter Tutorial,en,Veit Schiele,BSD-3-Clause,2019-06-27,NaN\n",
"2,Jupyter Tutorial,de,Veit Schiele,BSD-3-Clause,2020-10-26,NaN\n",
"3,PyViz Tutorial,en,Veit Schiele,BSD-3-Clause,2020-04-13,NaN\n"
]
}
],
"source": [
"df.to_csv(sys.stdout, na_rep=\"NaN\")"
]
},
{
"cell_type": "markdown",
"id": "57531679",
"metadata": {},
"source": [
"If no other options are given, both the row and column labels are written. Both can be deactivated:"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "b599ee64",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Python basics,en,Veit Schiele,BSD-3-Clause,2021-10-28,\n",
"Jupyter Tutorial,en,Veit Schiele,BSD-3-Clause,2019-06-27,\n",
"Jupyter Tutorial,de,Veit Schiele,BSD-3-Clause,2020-10-26,\n",
"PyViz Tutorial,en,Veit Schiele,BSD-3-Clause,2020-04-13,\n"
]
}
],
"source": [
"df.to_csv(sys.stdout, index=False, header=False)"
]
},
{
"cell_type": "markdown",
"id": "436ca565",
"metadata": {},
"source": [
"You can also write only a subset of the columns, in an order of your choosing:"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "3625a0fa",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Title,Language,Authors,Publication date\n",
"Python basics,en,Veit Schiele,2021-10-28\n",
"Jupyter Tutorial,en,Veit Schiele,2019-06-27\n",
"Jupyter Tutorial,de,Veit Schiele,2020-10-26\n",
"PyViz Tutorial,en,Veit Schiele,2020-04-13\n"
]
}
],
"source": [
"df.to_csv(\n",
" sys.stdout,\n",
" index=False,\n",
" columns=[\"Title\", \"Language\", \"Authors\", \"Publication date\"],\n",
")"
]
},
{
"cell_type": "markdown",
"id": "3791334b",
"metadata": {},
"source": [
"## Working with the csv module of Python\n",
"\n",
"Most forms of table data can be loaded using functions such as `pandas.read_csv`. However, in some cases manual processing may be required. It is not uncommon to receive a file with one or more incorrect rows that cause `read_csv` to fail. For any file with a single-digit delimiter, you can use Python's built-in [csv](https://docs.python.org/3/library/csv.html) module. To use it, pass an open file or file-like object to `csv.reader`:"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "bd1ba571",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['', 'Title', 'Language', 'Authors', 'License', 'Publication date', 'doi']\n",
"['0', 'Python basics', 'en', 'Veit Schiele', 'BSD-3-Clause', '2021-10-28', '']\n",
"['1', 'Jupyter Tutorial', 'en', 'Veit Schiele', 'BSD-3-Clause', '2019-06-27', '']\n",
"['2', 'Jupyter Tutorial', 'de', 'Veit Schiele', 'BSD-3-Clause', '2020-10-26', '']\n",
"['3', 'PyViz Tutorial', 'en', 'Veit Schiele', 'BSD-3-Clause', '2020-04-13', '']\n"
]
}
],
"source": [
"import csv\n",
"\n",
"\n",
"f = open(\"out.csv\")\n",
"reader = csv.reader(f)\n",
"\n",
"for line in reader:\n",
" print(line)"
]
},
{
"cell_type": "markdown",
"id": "3bc7ee20",
"metadata": {},
"source": [
"### Dialekte\n",
"\n",
"csv-Dateien gibt es in vielen verschiedenen Varianten. Das Python csv-Modul kommt bereits mit drei verschiedenen Dialekten:\n",
"\n",
"Parameter | excel | excel-tab | unix\n",
":--- | :--- | :--- | :---\n",
"`delimiter` | `','` | `'\\\\t'` | `','`\n",
"`quotechar` | `'\\\"'` | `'\\\"'` | ` '\\\"'`\n",
"`doublequote` | `True` | `True` | `True`\n",
"`skipinitialspace` | `False` | `False` | `False`\n",
"`lineterminator` | `'\\\\r\\\\n'` | `'\\\\r\\\\n'` | `'\\\\n'`\n",
"`quoting` | `csv.QUOTE_MINIMAL` | `csv.QUOTE_MINIMAL` | `csv.QUOTE_ALL`\n",
"`escapechar` | `None` | `None` | `None`\n",
"\n",
"You can also use it to define your own format with a different separator, a different string convention or a different end-of-line character. Registering your own dialect is recommended for this. Possible options and functions of `csv.register_dialect` are:\n",
"\n",
"Argument | Description\n",
":------- | :----------\n",
"`delimiter` | One-character string to separate fields; default value is `,`.\n",
"`lineterminator` | Line terminator for writing; default value is `\\r\\n`. Reader ignores this and recognises cross-platform line delimiters.\n",
"`quotechar` | Quotation marks for fields with special characters (like a separator); default is `\"`.\n",
"`quoting` | Quoting convention. Options include `csv.QUOTE_ALL` – quote all fields, `csv.QUOTE_MINIMAL` – quote only fields with special characters like the delimiter, `csv.QUOTE_NONNUMERIC`, and `csv.QUOTE_NONE` – no quotes. The default value is `QUOTE_MINIMAL`.\n",
"`skipinitialspace` | Ignore spaces after each delimiter; default is `False`.\n",
"`doublequote` | if `True`, quotes are doubled within a field.\n",
"`escapechar` | String to bypass the delimiter when `quoting` is set to `csv.QUOTE_NONE`; default is disabled."
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "c6d73a1e",
"metadata": {},
"outputs": [],
"source": [
"csv.register_dialect(\n",
" \"my_csv_dialect\",\n",
" lineterminator=\"\\n\",\n",
" delimiter=\",\",\n",
" quotechar=\"'\",\n",
" quoting=csv.QUOTE_MINIMAL,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "2633ad3b",
"metadata": {},
"source": [
"Now the CSV file can be opened with:"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "85ac6d66",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['', 'Title', 'Language', 'Authors', 'License', 'Publication date', 'doi']\n",
"['0', 'Python basics', 'en', 'Veit Schiele', 'BSD-3-Clause', '2021-10-28', '']\n",
"['1', 'Jupyter Tutorial', 'en', 'Veit Schiele', 'BSD-3-Clause', '2019-06-27', '']\n",
"['2', 'Jupyter Tutorial', 'de', 'Veit Schiele', 'BSD-3-Clause', '2020-10-26', '']\n",
"['3', 'PyViz Tutorial', 'en', 'Veit Schiele', 'BSD-3-Clause', '2020-04-13', '']\n"
]
}
],
"source": [
"with open(\"out.csv\") as f:\n",
" reader = csv.reader(f, \"my_csv_dialect\")\n",
" for line in reader:\n",
" print(line)"
]
},
{
"cell_type": "markdown",
"id": "610e6ddf",
"metadata": {},
"source": [
"Then we can create a Dict with data columns by using [Dict Comprehensions](https://peps.python.org/pep-0274/) and iterating over the values from `values` with [zip](https://docs.python.org/3/library/functions.html#zip). Note that this requires a lot of storage space for large files, as the rows are converted into columns:"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "341af079",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'': ('0', '1', '2', '3'),\n",
" 'Title': ('Python basics',\n",
" 'Jupyter Tutorial',\n",
" 'Jupyter Tutorial',\n",
" 'PyViz Tutorial'),\n",
" 'Language': ('en', 'en', 'de', 'en'),\n",
" 'Authors': ('Veit Schiele', 'Veit Schiele', 'Veit Schiele', 'Veit Schiele'),\n",
" 'License': ('BSD-3-Clause', 'BSD-3-Clause', 'BSD-3-Clause', 'BSD-3-Clause'),\n",
" 'Publication date': ('2021-10-28', '2019-06-27', '2020-10-26', '2020-04-13'),\n",
" 'doi': ('', '', '', '')}"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"with open(\"out.csv\") as f:\n",
" reader = csv.reader(f, \"my_csv_dialect\")\n",
" lines = list(reader)\n",
" header, values = lines[0], lines[1:]\n",
" data_dict = {h: v for h, v in zip(header, zip(*values))}\n",
"\n",
"data_dict"
]
},
{
"cell_type": "markdown",
"id": "52db491d",
"metadata": {},
"source": [
"To write files with separators manually, you can use `csv.writer`. It accepts an open, writable file object and the same dialect and format options as `csv.reader`:"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "69f3c21a",
"metadata": {},
"outputs": [],
"source": [
"with open(\"new.csv\", \"w\") as f:\n",
" writer = csv.writer(f, \"my_csv_dialect\")\n",
" writer.writerow((\"\", \"Titel\", \"Sprache\", \"Autor*innen\"))\n",
" writer.writerow((\"1\", \"Python basics\", \"en\", \"Veit Schiele\"))\n",
" writer.writerow((\"2\", \"Jupyter Tutorial\", \"en\", \"Veit Schiele\"))"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "ff5b4f67",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[',Titel,Sprache,Autor*innen\\n',\n",
" '1,Python basics,en,Veit Schiele\\n',\n",
" '2,Jupyter Tutorial,en,Veit Schiele\\n']"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"list(open(\"new.csv\"))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.13 Kernel",
"language": "python",
"name": "python313"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.0"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"state": {},
"version_major": 2,
"version_minor": 0
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}