Adding, changing and deleting data

With many data sets, you may want to perform a transformation based on the values in an array, series or column in a DataFrame. To do this, we look at the first Unicode characters:

[2]:
import numpy as np
import pandas as pd
[3]:
df = pd.DataFrame(
    {
        "Code": ["U+0000", "U+0001", "U+0002", "U+0003", "U+0004", "U+0005"],
        "Decimal": [0, 1, 2, 3, 4, 5],
        "Octal": ["001", "002", "003", "004", "004", "005"],
        "Key": ["NUL", "Ctrl-A", "Ctrl-B", "Ctrl-C", "Ctrl-D", "Ctrl-E"],
    },
)

df
[3]:
Code Decimal Octal Key
0 U+0000 0 001 NUL
1 U+0001 1 002 Ctrl-A
2 U+0002 2 003 Ctrl-B
3 U+0003 3 004 Ctrl-C
4 U+0004 4 004 Ctrl-D
5 U+0005 5 005 Ctrl-E

Adding data

Suppose you want to add a column in which characters are assigned to the C0 or C1 control code:

[4]:
control_code = {
    "u+0000": "C0",
    "u+0001": "C0",
    "u+0002": "C0",
    "u+0003": "C0",
    "u+0004": "C0",
    "u+0005": "C0",
}

The map method for a series accepts a function or a dict-like object containing a mapping, but here we have a small problem because some of the codes in control_code are lowercase, but not in our DataFrame. Therefore, we need to convert each value to lowercase using the str.lower method:

[5]:
lowercased = df["Code"].str.lower()

lowercased
[5]:
0    u+0000
1    u+0001
2    u+0002
3    u+0003
4    u+0004
5    u+0005
Name: Code, dtype: object
[6]:
df["Control code"] = lowercased.map(control_code)

df
[6]:
Code Decimal Octal Key Control code
0 U+0000 0 001 NUL C0
1 U+0001 1 002 Ctrl-A C0
2 U+0002 2 003 Ctrl-B C0
3 U+0003 3 004 Ctrl-C C0
4 U+0004 4 004 Ctrl-D C0
5 U+0005 5 005 Ctrl-E C0

We could also have passed a function that does all the work:

[7]:
df["Code"].map(lambda x: control_code[x.lower()])
[7]:
0    C0
1    C0
2    C0
3    C0
4    C0
5    C0
Name: Code, dtype: object

Using map is a convenient way to perform element-by-element transformations and other data cleansing operations.

Modifying data

The replace method can be used to replace certain values with others.

[8]:
s = pd.Series(["Manpower", "man-made", np.nan])
[9]:
s.replace("Man", "Personal")
[9]:
0    Manpower
1    man-made
2         NaN
dtype: object
[10]:
s.replace("[Mm]an", "Personal", regex=True)
[10]:
0    Personalpower
1    Personal-made
2              NaN
dtype: object
[11]:
s.replace(["[Mm]an", np.nan], ["Personal", 0], regex=True)
[11]:
0    Personalpower
1    Personal-made
2                0
dtype: object
[12]:
s.replace(["[Mm]an", np.nan], ["Personal", len(s)], regex=True)
[12]:
0    Personalpower
1    Personal-made
2                3
dtype: object

Deleting data

Deleting one or more entries from an axis is easy if you already have an index array or list without those entries.

Since this may require a little set theory, we return the drop method as a new object without the deleted value(s):

[15]:
rng = np.random.default_rng()

s = pd.Series(rng.random(7))
s
[15]:
0    0.957904
1    0.239710
2    0.666261
3    0.303127
4    0.000573
5    0.508528
6    0.329187
dtype: float64
[21]:
s.drop(2)
[21]:
0    0.957904
1    0.239710
3    0.303127
4    0.000573
5    0.508528
6    0.329187
dtype: float64
[22]:
s.drop([2, 3])
[22]:
0    0.957904
1    0.239710
4    0.000573
5    0.508528
6    0.329187
dtype: float64

With DataFrames, index values can be deleted on both axes. To illustrate this, we will first create a sample DataFrame:

[15]:
data = {
    "Code": ["U+0000", "U+0001", "U+0002", "U+0003", "U+0004", "U+0005"],
    "Decimal": [0, 1, 2, 3, 4, 5],
    "Octal": ["001", "002", "003", "004", "004", "005"],
    "Key": ["NUL", "Ctrl-A", "Ctrl-B", "Ctrl-C", "Ctrl-D", "Ctrl-E"],
}

df = pd.DataFrame(data)

df
[15]:
Code Decimal Octal Key
0 U+0000 0 001 NUL
1 U+0001 1 002 Ctrl-A
2 U+0002 2 003 Ctrl-B
3 U+0003 3 004 Ctrl-C
4 U+0004 4 004 Ctrl-D
5 U+0005 5 005 Ctrl-E
[16]:
df.drop([0, 1])
[16]:
Code Decimal Octal Key
2 U+0002 2 003 Ctrl-B
3 U+0003 3 004 Ctrl-C
4 U+0004 4 004 Ctrl-D
5 U+0005 5 005 Ctrl-E

You can also remove values from the columns by passing axis=1 or axis="columns":

[17]:
df.drop("Decimal", axis=1)
[17]:
Code Octal Key
0 U+0000 001 NUL
1 U+0001 002 Ctrl-A
2 U+0002 003 Ctrl-B
3 U+0003 004 Ctrl-C
4 U+0004 004 Ctrl-D
5 U+0005 005 Ctrl-E

Many functions, such as drop, which change the size or shape of an array or DataFrame, can manipulate an object in place without returning a new object:

[18]:
df.drop(0, inplace=True)

df
[18]:
Code Decimal Octal Key
1 U+0001 1 002 Ctrl-A
2 U+0002 2 003 Ctrl-B
3 U+0003 3 004 Ctrl-C
4 U+0004 4 004 Ctrl-D
5 U+0005 5 005 Ctrl-E

Warning:

Be careful with the inplace function, as the data will be irretrievably deleted.

See also: