Arithmetic¶

An important function of pandas is the arithmetic behaviour for objects with different indices. When adding objects, if the index pairs are not equal, the corresponding index in the result will be the union of the index pairs. For users with database experience, this is comparable to an automatic outer join on the index labels. Let’s look at an example:

[1]:

import numpy as np
import pandas as pd


rng = np.random.default_rng()
s1 = pd.Series(rng.normal(size=5))
s2 = pd.Series(rng.normal(size=7))

If you add these values, you get:

[2]:

s1 + s2

[2]:

0    1.664505
1   -0.366291
2    0.020457
3    1.151139
4   -0.674874
5         NaN
6         NaN
dtype: float64

The internal data matching leads to missing values at the points of the labels that do not overlap. Missing values are then passed on in further arithmetic calculations.

For DataFrames, alignment is performed for both rows and columns:

[3]:

df1 = pd.DataFrame(rng.normal(size=(5, 3)))
df2 = pd.DataFrame(rng.normal(size=(7, 2)))

When the two DataFrames are added together, the result is a DataFrame whose index and columns are the unions of those in each of the DataFrames above:

[4]:

df1 + df2

[4]:

	0	1	2
0	-0.451974	-0.041735	NaN
1	-0.670561	-3.538462	NaN
2	-0.731903	-0.933451	NaN
3	0.136283	-0.973772	NaN
4	-0.911367	-0.702978	NaN
5	NaN	NaN	NaN
6	NaN	NaN	NaN

Since column 2 does not appear in both DataFrame objects, its values appear as missing in the result. The same applies to the rows whose labels do not appear in both objects.

Arithmetic methods with fill values¶

In arithmetic operations between differently indexed objects, a special value (e.g. 0) can be useful if an axis label is found in one object but not in the other. The add method can pass the fill_value argument:

[5]:

df12 = df1.add(df2, fill_value=0)

df12

[5]:

	0	1	2
0	-0.451974	-0.041735	0.503665
1	-0.670561	-3.538462	-1.881109
2	-0.731903	-0.933451	0.539550
3	0.136283	-0.973772	-1.207889
4	-0.911367	-0.702978	-0.552577
5	-1.019179	0.678391	NaN
6	-2.275196	2.685174	NaN

In the following example, we set the two remaining NaN values to 0:

[6]:

df12.iloc[[5, 6], [2]] = 0

[7]:

df12

[7]:

	0	1	2
0	-0.451974	-0.041735	0.503665
1	-0.670561	-3.538462	-1.881109
2	-0.731903	-0.933451	0.539550
3	0.136283	-0.973772	-1.207889
4	-0.911367	-0.702978	-0.552577
5	-1.019179	0.678391	0.000000
6	-2.275196	2.685174	0.000000

Arithmetic methods¶

Method	Description
`add`, `radd`	methods for addition (`+`)
`sub`, `rsub`	methods for subtraction (`-`)
`div`, `rdiv`	methods for division (`/`)
`floordiv`, `rfloordiv`	methods for floor division (`//`)
`mul`, `rmul`	methods for multiplication (`*`)
`pow`, `rpow`	methods for exponentiation (`**`)

r (English: reverse) reverses the method.

Operations between DataFrame and Series¶

As with NumPy arrays of different dimensions, the arithmetic between DataFrame and Series is also defined.

[8]:

s1 + df12

[8]:

	0	1	2	3	4
0	0.544941	-0.633291	1.554287	NaN	NaN
1	0.326354	-4.130018	-0.830487	NaN	NaN
2	0.265012	-1.525008	1.590171	NaN	NaN
3	1.133198	-1.565328	-0.157267	NaN	NaN
4	0.085547	-1.294534	0.498044	NaN	NaN
5	-0.022264	0.086835	1.050622	NaN	NaN
6	-1.278281	2.093618	1.050622	NaN	NaN

If we add s1 with df12, the addition is done once for each line. This is called broadcasting. By default, the arithmetic between the DataFrame and the series corresponds to the index of the series in the columns of the DataFrame, with the rows being broadcast down.

If an index value is found neither in the columns of the DataFrame nor in the index of the series, the objects are re-indexed to form the union:

If instead you want to transfer the columns and match the rows, you must use one of the arithmetic methods, for example:

[9]:

df12.add(s2, axis="index")

[9]:

	0	1	2
0	0.215616	0.625855	1.171255
1	-0.445295	-3.313196	-1.655843
2	-1.762067	-1.963616	-0.490615
3	0.526549	-0.583506	-0.817623
4	-0.497226	-0.288836	-0.138436
5	0.283680	1.981250	1.302859
6	-2.832790	2.127580	-0.557594

The axis number you pass is the axis to be aligned to. In this case, the row index of the DataFrame (axis='index' or axis=0) is to be adjusted and transmitted.

Function application and mapping¶

numpy.ufunc (element-wise array methods) also work with pandas objects:

[10]:

np.abs(df12)

[10]:

	0	1	2
0	0.451974	0.041735	0.503665
1	0.670561	3.538462	1.881109
2	0.731903	0.933451	0.539550
3	0.136283	0.973772	1.207889
4	0.911367	0.702978	0.552577
5	1.019179	0.678391	0.000000
6	2.275196	2.685174	0.000000

Another common operation is to apply a function to one-dimensional arrays on each column or row. The pandas.DataFrame.apply method does just that:

[11]:

df12

[11]:

	0	1	2
0	-0.451974	-0.041735	0.503665
1	-0.670561	-3.538462	-1.881109
2	-0.731903	-0.933451	0.539550
3	0.136283	-0.973772	-1.207889
4	-0.911367	-0.702978	-0.552577
5	-1.019179	0.678391	0.000000
6	-2.275196	2.685174	0.000000

[12]:

def minmaxrange(x):
    return x.max() - x.min()


df12.apply(minmaxrange)

[12]:

0    2.411479
1    6.223636
2    2.420658
dtype: float64

Here the function minmaxrange(), which calculates the difference between the maximum and minimum of a row, is called once for each column of the frame. The result is a row with the columns of the frame as index.

If you pass axis="columns" to apply, the function will be called once per line instead:

[13]:

df12.apply(minmaxrange, axis="columns")

[13]:

0    0.955639
1    2.867901
2    1.473001
3    1.344172
4    0.358790
5    1.697570
6    4.960370
dtype: float64

Many of the most common array statistics (such as sum and mean) are DataFrame methods, so the use of apply is not necessary.

The function passed to apply does not have to return a single value; it can also return a series with multiple values:

[14]:

def minmax(x):
    return pd.Series([x.min(), x.max()], index=["min", "max"])


df12.apply(minmax)

[14]:

	0	1	2
min	-2.275196	-3.538462	-1.881109
max	0.136283	2.685174	0.539550

You can also use element-wise Python functions. Suppose you want to round each floating point value in df12 to two decimal places, you can do this with pandas.DataFrame.map:

[15]:

def round_two(x):
    return round(x, 2)


df12.map(round_two)

[15]:

	0	1	2
0	-0.45	-0.04	0.50
1	-0.67	-3.54	-1.88
2	-0.73	-0.93	0.54
3	0.14	-0.97	-1.21
4	-0.91	-0.70	-0.55
5	-1.02	0.68	0.00
6	-2.28	2.69	0.00

The map method can also be applied to Series:

[16]:

df12[2].map(round_two)

[16]:

0    0.50
1   -1.88
2    0.54
3   -1.21
4   -0.55
5    0.00
6    0.00
Name: 2, dtype: float64