Add, change and delete data#

For many data sets, you may want to perform a transformation based on the values in an array, series or column in a DataFrame. For this, we look at the first Unicode characters:

[1]:
import numpy as np
import pandas as pd
[2]:
df = pd.DataFrame(
    {
        "Code": ["U+0000", "U+0001", "U+0002", "U+0003", "U+0004", "U+0005"],
        "Decimal": [0, 1, 2, 3, 4, 5],
        "Octal": ["001", "002", "003", "004", "004", "005"],
        "Key": ["NUL", "Ctrl-A", "Ctrl-B", "Ctrl-C", "Ctrl-D", "Ctrl-E"],
    }
)

df
[2]:
Code Decimal Octal Key
0 U+0000 0 001 NUL
1 U+0001 1 002 Ctrl-A
2 U+0002 2 003 Ctrl-B
3 U+0003 3 004 Ctrl-C
4 U+0004 4 004 Ctrl-D
5 U+0005 5 005 Ctrl-E

Add data#

Suppose you want to add a column where the characters are assigned to the C0 or C1 control code:

[3]:
control_code = {
    "u+0000": "C0",
    "u+0001": "C0",
    "u+0002": "C0",
    "u+0003": "C0",
    "u+0004": "C0",
    "u+0005": "C0",
}

The map method for a series accepts a function or dict-like object that contains an assignment, but here we have a small problem because some of the codes in control_code are lower case, but not in our DataFrame. Therefore, we need to convert each value to lower case using the str.lower method:

[4]:
lowercased = df["Code"].str.lower()

lowercased
[4]:
0    u+0000
1    u+0001
2    u+0002
3    u+0003
4    u+0004
5    u+0005
Name: Code, dtype: object
[5]:
df["Control code"] = lowercased.map(control_code)

df
[5]:
Code Decimal Octal Key Control code
0 U+0000 0 001 NUL C0
1 U+0001 1 002 Ctrl-A C0
2 U+0002 2 003 Ctrl-B C0
3 U+0003 3 004 Ctrl-C C0
4 U+0004 4 004 Ctrl-D C0
5 U+0005 5 005 Ctrl-E C0

We could also have passed a function that does all the work:

[6]:
df["Code"].map(lambda x: control_code[x.lower()])
[6]:
0    C0
1    C0
2    C0
3    C0
4    C0
5    C0
Name: Code, dtype: object

Using map is a convenient way to perform element-wise transformations and other data cleaning operations.

Change data#

Note:

Replacing missing values is described in Managing missing data with pandas.

[7]:
pd.Series(["Manpower", "man-made"]).str.replace("Man", "Personal", regex=False)
[7]:
0    Personalpower
1         man-made
dtype: object
[8]:
pd.Series(["Man-Power", "man-made"]).str.replace("[Mm]an", "Personal", regex=True)
[8]:
0    Personal-Power
1     Personal-made
dtype: object

Note:

The replace method differs from str.replace in that it replaces strings element by element.

Delete data#

Deleting one or more entries from an axis is easy if you already have an index array or a list without these entries.

To delete duplicates, see Deduplicating data.

Since this may require a bit of set theory, we return the drop method as a new object without the deleted values:

[9]:
rng = np.random.default_rng()
s = pd.Series(rng.normal(size=7))

s
[9]:
0   -0.800629
1   -1.018902
2   -0.183417
3   -0.789888
4   -1.898217
5   -0.774574
6   -0.370043
dtype: float64
[10]:
new = s.drop(2)

new
[10]:
0   -0.800629
1   -1.018902
3   -0.789888
4   -1.898217
5   -0.774574
6   -0.370043
dtype: float64
[11]:
new = s.drop([2, 3])

new
[11]:
0   -0.800629
1   -1.018902
4   -1.898217
5   -0.774574
6   -0.370043
dtype: float64

With DataFrames, index values can be deleted on both axes. To illustrate this, we first create an example DataFrame:

[12]:
data = {
    "Code": ["U+0000", "U+0001", "U+0002", "U+0003", "U+0004", "U+0005"],
    "Decimal": [0, 1, 2, 3, 4, 5],
    "Octal": ["001", "002", "003", "004", "004", "005"],
    "Key": ["NUL", "Ctrl-A", "Ctrl-B", "Ctrl-C", "Ctrl-D", "Ctrl-E"],
}

df = pd.DataFrame(data)

df
[12]:
Code Decimal Octal Key
0 U+0000 0 001 NUL
1 U+0001 1 002 Ctrl-A
2 U+0002 2 003 Ctrl-B
3 U+0003 3 004 Ctrl-C
4 U+0004 4 004 Ctrl-D
5 U+0005 5 005 Ctrl-E
[13]:
df.drop([0, 1])
[13]:
Code Decimal Octal Key
2 U+0002 2 003 Ctrl-B
3 U+0003 3 004 Ctrl-C
4 U+0004 4 004 Ctrl-D
5 U+0005 5 005 Ctrl-E

You can also remove values from the columns by passing axis=1 or axis='columns':

[14]:
df.drop("Decimal", axis=1)
[14]:
Code Octal Key
0 U+0000 001 NUL
1 U+0001 002 Ctrl-A
2 U+0002 003 Ctrl-B
3 U+0003 004 Ctrl-C
4 U+0004 004 Ctrl-D
5 U+0005 005 Ctrl-E

Many functions such as drop that change the size or shape of a row or DataFrame can manipulate an object in place without returning a new object:

[15]:
df.drop(0, inplace=True)

df
[15]:
Code Decimal Octal Key
1 U+0001 1 002 Ctrl-A
2 U+0002 2 003 Ctrl-B
3 U+0003 3 004 Ctrl-C
4 U+0004 4 004 Ctrl-D
5 U+0005 5 005 Ctrl-E

Warning:

Be careful with the inplace function, as the data will be irretrievably deleted.