Adding, changing and deleting data¶

With many data sets, you may want to perform a transformation based on the values in an array, series or column in a DataFrame. To do this, we look at the first Unicode characters:

[2]:

import numpy as np
import pandas as pd

[3]:

df = pd.DataFrame(
    {
        "Code": ["U+0000", "U+0001", "U+0002", "U+0003", "U+0004", "U+0005"],
        "Decimal": [0, 1, 2, 3, 4, 5],
        "Octal": ["001", "002", "003", "004", "004", "005"],
        "Key": ["NUL", "Ctrl-A", "Ctrl-B", "Ctrl-C", "Ctrl-D", "Ctrl-E"],
    },
)

df

[3]:

	Code	Decimal	Octal	Key
0	U+0000	0	001	NUL
1	U+0001	1	002	Ctrl-A
2	U+0002	2	003	Ctrl-B
3	U+0003	3	004	Ctrl-C
4	U+0004	4	004	Ctrl-D
5	U+0005	5	005	Ctrl-E

Adding data¶

Suppose you want to add a column in which characters are assigned to the C0 or C1 control code:

[4]:

control_code = {
    "u+0000": "C0",
    "u+0001": "C0",
    "u+0002": "C0",
    "u+0003": "C0",
    "u+0004": "C0",
    "u+0005": "C0",
}

The map method for a series accepts a function or a dict-like object containing a mapping, but here we have a small problem because some of the codes in control_code are lowercase, but not in our DataFrame. Therefore, we need to convert each value to lowercase using the str.lower method:

[5]:

lowercased = df["Code"].str.lower()

lowercased

[5]:

0    u+0000
1    u+0001
2    u+0002
3    u+0003
4    u+0004
5    u+0005
Name: Code, dtype: object

[6]:

df["Control code"] = lowercased.map(control_code)

df

[6]:

	Code	Decimal	Octal	Key	Control code
0	U+0000	0	001	NUL	C0
1	U+0001	1	002	Ctrl-A	C0
2	U+0002	2	003	Ctrl-B	C0
3	U+0003	3	004	Ctrl-C	C0
4	U+0004	4	004	Ctrl-D	C0
5	U+0005	5	005	Ctrl-E	C0

We could also have passed a function that does all the work:

[7]:

df["Code"].map(lambda x: control_code[x.lower()])

[7]:

0    C0
1    C0
2    C0
3    C0
4    C0
5    C0
Name: Code, dtype: object

Using map is a convenient way to perform element-by-element transformations and other data cleansing operations.

Modifying data¶

The replace method can be used to replace certain values with others.

[8]:

s = pd.Series(["Manpower", "man-made", np.nan])

[9]:

s.replace("Man", "Personal")

[9]:

0    Manpower
1    man-made
2         NaN
dtype: object

[10]:

s.replace("[Mm]an", "Personal", regex=True)

[10]:

0    Personalpower
1    Personal-made
2              NaN
dtype: object

[11]:

s.replace(["[Mm]an", np.nan], ["Personal", 0], regex=True)

[11]:

0    Personalpower
1    Personal-made
2                0
dtype: object

[12]:

s.replace(["[Mm]an", np.nan], ["Personal", len(s)], regex=True)

[12]:

0    Personalpower
1    Personal-made
2                3
dtype: object

See also:

Managing missing data with pandas

Deleting data¶

Deleting one or more entries from an axis is easy if you already have an index array or list without those entries.

Since this may require a little set theory, we return the drop method as a new object without the deleted value(s):

[15]:

rng = np.random.default_rng()

s = pd.Series(rng.random(7))
s

[15]:

0    0.957904
1    0.239710
2    0.666261
3    0.303127
4    0.000573
5    0.508528
6    0.329187
dtype: float64

[21]:

s.drop(2)

[21]:

0    0.957904
1    0.239710
3    0.303127
4    0.000573
5    0.508528
6    0.329187
dtype: float64

[22]:

s.drop([2, 3])

[22]:

0    0.957904
1    0.239710
4    0.000573
5    0.508528
6    0.329187
dtype: float64

With DataFrames, index values can be deleted on both axes. To illustrate this, we will first create a sample DataFrame:

[15]:

data = {
    "Code": ["U+0000", "U+0001", "U+0002", "U+0003", "U+0004", "U+0005"],
    "Decimal": [0, 1, 2, 3, 4, 5],
    "Octal": ["001", "002", "003", "004", "004", "005"],
    "Key": ["NUL", "Ctrl-A", "Ctrl-B", "Ctrl-C", "Ctrl-D", "Ctrl-E"],
}

df = pd.DataFrame(data)

df

[15]:

	Code	Decimal	Octal	Key
0	U+0000	0	001	NUL
1	U+0001	1	002	Ctrl-A
2	U+0002	2	003	Ctrl-B
3	U+0003	3	004	Ctrl-C
4	U+0004	4	004	Ctrl-D
5	U+0005	5	005	Ctrl-E

[16]:

df.drop([0, 1])

[16]:

	Code	Decimal	Octal	Key
2	U+0002	2	003	Ctrl-B
3	U+0003	3	004	Ctrl-C
4	U+0004	4	004	Ctrl-D
5	U+0005	5	005	Ctrl-E

You can also remove values from the columns by passing axis=1 or axis="columns":

[17]:

df.drop("Decimal", axis=1)

[17]:

	Code	Octal	Key
0	U+0000	001	NUL
1	U+0001	002	Ctrl-A
2	U+0002	003	Ctrl-B
3	U+0003	004	Ctrl-C
4	U+0004	004	Ctrl-D
5	U+0005	005	Ctrl-E

Many functions, such as drop, which change the size or shape of an array or DataFrame, can manipulate an object in place without returning a new object:

[18]:

df.drop(0, inplace=True)

df

[18]:

	Code	Decimal	Octal	Key
1	U+0001	1	002	Ctrl-A
2	U+0002	2	003	Ctrl-B
3	U+0003	3	004	Ctrl-C
4	U+0004	4	004	Ctrl-D
5	U+0005	5	005	Ctrl-E

Warning:

Be careful with the inplace function, as the data will be irretrievably deleted.

See also:

Deduplicate data