{ "cells": [ { "cell_type": "markdown", "id": "765a0354", "metadata": {}, "source": [ "# Hinzufügen, Ändern und Löschen von Daten\n", "\n", "Bei vielen Datensätzen möchtet ihr vielleicht eine Transformation basierend auf den Werten in einem Array, einer Serie oder einer Spalte in einem DataFrame durchführen. Hierfür betrachten wir die ersten Unicode-Zeichen:" ] }, { "cell_type": "code", "execution_count": 1, "id": "75059c0e", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "id": "0aa26fb0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CodeDecimalOctalKey
0U+00000001NUL
1U+00011002Ctrl-A
2U+00022003Ctrl-B
3U+00033004Ctrl-C
4U+00044004Ctrl-D
5U+00055005Ctrl-E
\n", "
" ], "text/plain": [ " Code Decimal Octal Key\n", "0 U+0000 0 001 NUL\n", "1 U+0001 1 002 Ctrl-A\n", "2 U+0002 2 003 Ctrl-B\n", "3 U+0003 3 004 Ctrl-C\n", "4 U+0004 4 004 Ctrl-D\n", "5 U+0005 5 005 Ctrl-E" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame(\n", " {\n", " \"Code\": [\"U+0000\", \"U+0001\", \"U+0002\", \"U+0003\", \"U+0004\", \"U+0005\"],\n", " \"Decimal\": [0, 1, 2, 3, 4, 5],\n", " \"Octal\": [\"001\", \"002\", \"003\", \"004\", \"004\", \"005\"],\n", " \"Key\": [\"NUL\", \"Ctrl-A\", \"Ctrl-B\", \"Ctrl-C\", \"Ctrl-D\", \"Ctrl-E\"],\n", " }\n", ")\n", "\n", "df" ] }, { "cell_type": "markdown", "id": "15da5652", "metadata": {}, "source": [ "## Daten hinzufügen\n", "\n", "Angenommen, ihr möchtet eine Spalte hinzufügen, in der die Zeichen dem `C0`- oder `C1`-Steuercode zugewiesen werden:" ] }, { "cell_type": "code", "execution_count": 3, "id": "b96f46e7", "metadata": {}, "outputs": [], "source": [ "control_code = {\n", " \"u+0000\": \"C0\",\n", " \"u+0001\": \"C0\",\n", " \"u+0002\": \"C0\",\n", " \"u+0003\": \"C0\",\n", " \"u+0004\": \"C0\",\n", " \"u+0005\": \"C0\",\n", "}" ] }, { "cell_type": "markdown", "id": "14593018", "metadata": {}, "source": [ "Die `map`-Methode für eine Serie akzeptiert eine Funktion oder ein diktatähnliches Objekt, das eine Zuordnung enthält, aber hier haben wir ein kleines Problem, da einige die Codes in `control_code` kleingeschrieben sind, nicht jedoch in unserem DataFrame. Daher müssen wir jeden Wert mit der Methode `str.lower` in Kleinbuchstaben umwandeln:" ] }, { "cell_type": "code", "execution_count": 4, "id": "036a2cb7", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 u+0000\n", "1 u+0001\n", "2 u+0002\n", "3 u+0003\n", "4 u+0004\n", "5 u+0005\n", "Name: Code, dtype: object" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lowercased = df[\"Code\"].str.lower()\n", "\n", "lowercased" ] }, { "cell_type": "code", "execution_count": 5, "id": "a21615a2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CodeDecimalOctalKeyControl code
0U+00000001NULC0
1U+00011002Ctrl-AC0
2U+00022003Ctrl-BC0
3U+00033004Ctrl-CC0
4U+00044004Ctrl-DC0
5U+00055005Ctrl-EC0
\n", "
" ], "text/plain": [ " Code Decimal Octal Key Control code\n", "0 U+0000 0 001 NUL C0\n", "1 U+0001 1 002 Ctrl-A C0\n", "2 U+0002 2 003 Ctrl-B C0\n", "3 U+0003 3 004 Ctrl-C C0\n", "4 U+0004 4 004 Ctrl-D C0\n", "5 U+0005 5 005 Ctrl-E C0" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[\"Control code\"] = lowercased.map(control_code)\n", "\n", "df" ] }, { "cell_type": "markdown", "id": "7a1a01ca", "metadata": {}, "source": [ "Wir hätten auch eine Funktion übergeben können, die die ganze Arbeit erledigt:" ] }, { "cell_type": "code", "execution_count": 6, "id": "6cedb9de", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 C0\n", "1 C0\n", "2 C0\n", "3 C0\n", "4 C0\n", "5 C0\n", "Name: Code, dtype: object" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[\"Code\"].map(lambda x: control_code[x.lower()])" ] }, { "cell_type": "markdown", "id": "e7f5d504", "metadata": {}, "source": [ "Die Verwendung von `map` ist ein bequemer Weg, um elementweise Transformationen und andere Datenbereinigungsoperationen durchzuführen." ] }, { "cell_type": "markdown", "id": "eb9bfab9", "metadata": {}, "source": [ "## Daten ändern" ] }, { "cell_type": "markdown", "id": "bb60fda3", "metadata": {}, "source": [ "
\n", "\n", "**Hinweis:**\n", "\n", "Das Ersetzen fehlender Werte wird in [Verwalten fehlender Daten mit pandas](../../clean-prep/nulls.ipynb) beschrieben.\n", "
" ] }, { "cell_type": "code", "execution_count": 7, "id": "f0b36c7d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 Personalpower\n", "1 man-made\n", "dtype: object" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.Series([\"Manpower\", \"man-made\"]).str.replace(\"Man\", \"Personal\", regex=False)" ] }, { "cell_type": "code", "execution_count": 8, "id": "017677e6", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 Personal-Power\n", "1 Personal-made\n", "dtype: object" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.Series([\"Man-Power\", \"man-made\"]).str.replace(\n", " \"[Mm]an\", \"Personal\", regex=True\n", ")" ] }, { "cell_type": "markdown", "id": "d001c237", "metadata": {}, "source": [ "
\n", "\n", "**Hinweis:**\n", "\n", "Die Methode [replace](https://pandas.pydata.org/docs/reference/api/pandas.Series.replace.html) unterscheidet sich von [str.replace](https://pandas.pydata.org/docs/reference/api/pandas.Series.str.replace.html), dadurch, dass diese elementweise Zeichenketten ersetzt.\n", "
" ] }, { "cell_type": "markdown", "id": "0f2b0a9f", "metadata": {}, "source": [ "## Daten löschen" ] }, { "cell_type": "markdown", "id": "5297dada", "metadata": {}, "source": [ "Einen oder mehrere Einträge aus einer Achse zu löschen ist einfach, wenn ihr bereits ein Index-Array oder eine Liste ohne diese Einträge habt.\n", "\n", "Zum Löschen von Duplikaten siehe [Daten deduplizieren](../../clean-prep/deduplicate.ipynb)." ] }, { "cell_type": "markdown", "id": "16cea6c7", "metadata": {}, "source": [ "Da dies ein wenig Mengenlehre erfordern kann, geben wir die Drop-Methode als neues Objekt ohne den oder die gelöschten Werten zurück:" ] }, { "cell_type": "code", "execution_count": 9, "id": "07f0e4e0", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 0.975612\n", "1 0.375596\n", "2 0.582852\n", "3 -1.908045\n", "4 -0.254497\n", "5 0.499824\n", "6 2.969355\n", "dtype: float64" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s = pd.Series(np.random.randn(7))\n", "\n", "s" ] }, { "cell_type": "code", "execution_count": 10, "id": "c33dd6b2", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 0.975612\n", "1 0.375596\n", "3 -1.908045\n", "4 -0.254497\n", "5 0.499824\n", "6 2.969355\n", "dtype: float64" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "new = s.drop(2)\n", "\n", "new" ] }, { "cell_type": "code", "execution_count": 11, "id": "9a41f1fa", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 0.975612\n", "1 0.375596\n", "4 -0.254497\n", "5 0.499824\n", "6 2.969355\n", "dtype: float64" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "new = s.drop([2, 3])\n", "\n", "new" ] }, { "cell_type": "markdown", "id": "362891d3", "metadata": {}, "source": [ "Bei DataFrames können Indexwerte auf beiden Achsen gelöscht werden. Um dies zu veranschaulichen, erstellen wir zunächst einen Beispiel-DataFrame:" ] }, { "cell_type": "code", "execution_count": 12, "id": "88180f1d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CodeDecimalOctalKey
0U+00000001NUL
1U+00011002Ctrl-A
2U+00022003Ctrl-B
3U+00033004Ctrl-C
4U+00044004Ctrl-D
5U+00055005Ctrl-E
\n", "
" ], "text/plain": [ " Code Decimal Octal Key\n", "0 U+0000 0 001 NUL\n", "1 U+0001 1 002 Ctrl-A\n", "2 U+0002 2 003 Ctrl-B\n", "3 U+0003 3 004 Ctrl-C\n", "4 U+0004 4 004 Ctrl-D\n", "5 U+0005 5 005 Ctrl-E" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = {\n", " \"Code\": [\"U+0000\", \"U+0001\", \"U+0002\", \"U+0003\", \"U+0004\", \"U+0005\"],\n", " \"Decimal\": [0, 1, 2, 3, 4, 5],\n", " \"Octal\": [\"001\", \"002\", \"003\", \"004\", \"004\", \"005\"],\n", " \"Key\": [\"NUL\", \"Ctrl-A\", \"Ctrl-B\", \"Ctrl-C\", \"Ctrl-D\", \"Ctrl-E\"],\n", "}\n", "\n", "df = pd.DataFrame(data)\n", "\n", "df" ] }, { "cell_type": "code", "execution_count": 13, "id": "49667d00", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CodeDecimalOctalKey
2U+00022003Ctrl-B
3U+00033004Ctrl-C
4U+00044004Ctrl-D
5U+00055005Ctrl-E
\n", "
" ], "text/plain": [ " Code Decimal Octal Key\n", "2 U+0002 2 003 Ctrl-B\n", "3 U+0003 3 004 Ctrl-C\n", "4 U+0004 4 004 Ctrl-D\n", "5 U+0005 5 005 Ctrl-E" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.drop([0, 1])" ] }, { "cell_type": "markdown", "id": "1f5543d0", "metadata": {}, "source": [ "Ihr könnt auch Werte aus den Spalten entfernen, indem ihr `axis=1` oder `axis='columns'` übergebt:" ] }, { "cell_type": "code", "execution_count": 14, "id": "09a66ce8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CodeOctalKey
0U+0000001NUL
1U+0001002Ctrl-A
2U+0002003Ctrl-B
3U+0003004Ctrl-C
4U+0004004Ctrl-D
5U+0005005Ctrl-E
\n", "
" ], "text/plain": [ " Code Octal Key\n", "0 U+0000 001 NUL\n", "1 U+0001 002 Ctrl-A\n", "2 U+0002 003 Ctrl-B\n", "3 U+0003 004 Ctrl-C\n", "4 U+0004 004 Ctrl-D\n", "5 U+0005 005 Ctrl-E" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.drop(\"Decimal\", axis=1)" ] }, { "cell_type": "markdown", "id": "54dbbf32", "metadata": {}, "source": [ "Viele Funktionen wie `drop`, die die Größe oder Form einer Reihe oder eines DataFrame ändern, können ein Objekt an Ort und Stelle manipulieren, ohne ein neues Objekt zurückzugeben:" ] }, { "cell_type": "code", "execution_count": 15, "id": "24503f8c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CodeDecimalOctalKey
1U+00011002Ctrl-A
2U+00022003Ctrl-B
3U+00033004Ctrl-C
4U+00044004Ctrl-D
5U+00055005Ctrl-E
\n", "
" ], "text/plain": [ " Code Decimal Octal Key\n", "1 U+0001 1 002 Ctrl-A\n", "2 U+0002 2 003 Ctrl-B\n", "3 U+0003 3 004 Ctrl-C\n", "4 U+0004 4 004 Ctrl-D\n", "5 U+0005 5 005 Ctrl-E" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.drop(0, inplace=True)\n", "\n", "df" ] }, { "cell_type": "markdown", "id": "69ede8c6", "metadata": {}, "source": [ "
\n", "\n", "**Warnung:**\n", "\n", "Seid vorsichtig mit der `inplace`-Funktion, da die Daten unwiderbringlich gelöscht werden." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.13 Kernel", "language": "python", "name": "python313" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.0" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }