Arithmetic#

An important function of pandas is the arithmetic behaviour for objects with different indices. When adding objects, if the index pairs are not equal, the corresponding index in the result will be the union of the index pairs. For users with database experience, this is comparable to an automatic outer join on the index labels. Let’s look at an example:

[1]:
import numpy as np
import pandas as pd


rng = np.random.default_rng()
s1 = pd.Series(rng.normal(size=5))
s2 = pd.Series(rng.normal(size=7))

If you add these values, you get:

[2]:
s1 + s2
[2]:
0    2.596929
1   -2.795545
2   -0.119064
3    0.849508
4   -0.061194
5         NaN
6         NaN
dtype: float64

The internal data matching leads to missing values at the points of the labels that do not overlap. Missing values are then passed on in further arithmetic calculations.

For DataFrames, alignment is performed for both rows and columns:

[3]:
df1 = pd.DataFrame(rng.normal(size=(5,3)))
df2 = pd.DataFrame(rng.normal(size=(7,2)))

When the two DataFrames are added together, the result is a DataFrame whose index and columns are the unions of those in each of the DataFrames above:

[4]:
df1 + df2
[4]:
0 1 2
0 -0.078026 0.643059 NaN
1 -0.383531 2.018909 NaN
2 -2.770130 -0.751184 NaN
3 -0.679346 0.926763 NaN
4 -1.093289 1.424987 NaN
5 NaN NaN NaN
6 NaN NaN NaN

Since column 2 does not appear in both DataFrame objects, its values appear as missing in the result. The same applies to the rows whose labels do not appear in both objects.

Arithmetic methods with fill values#

In arithmetic operations between differently indexed objects, a special value (e.g. 0) can be useful if an axis label is found in one object but not in the other. The add method can pass the fill_value argument:

[5]:
df12 = df1.add(df2, fill_value=0)

df12
[5]:
0 1 2
0 -0.078026 0.643059 0.136076
1 -0.383531 2.018909 -0.660599
2 -2.770130 -0.751184 -1.709924
3 -0.679346 0.926763 -1.403627
4 -1.093289 1.424987 -0.283248
5 0.030022 -1.465972 NaN
6 -0.508131 0.527970 NaN

In the following example, we set the two remaining NaN values to 0:

[6]:
df12.iloc[[5, 6], [2]] = 0
[7]:
df12
[7]:
0 1 2
0 -0.078026 0.643059 0.136076
1 -0.383531 2.018909 -0.660599
2 -2.770130 -0.751184 -1.709924
3 -0.679346 0.926763 -1.403627
4 -1.093289 1.424987 -0.283248
5 0.030022 -1.465972 0.000000
6 -0.508131 0.527970 0.000000

Arithmetic methods#

Method

Description

add, radd

methods for addition (+)

sub, rsub

methods for subtraction (-)

div, rdiv

methods for division (/)

floordiv, rfloordiv

methods for floor division (//)

mul, rmul

methods for multiplication (*)

pow, rpow

methods for exponentiation (**)

r (English: reverse) reverses the method.

Operations between DataFrame and Series#

As with NumPy arrays of different dimensions, the arithmetic between DataFrame and Series is also defined.

[8]:
s1 + df12
[8]:
0 1 2 3 4
0 0.583883 -1.140178 0.991236 NaN NaN
1 0.278378 0.235672 0.194562 NaN NaN
2 -2.108221 -2.534422 -0.854764 NaN NaN
3 -0.017437 -0.856475 -0.548466 NaN NaN
4 -0.431380 -0.358250 0.571912 NaN NaN
5 0.691931 -3.249210 0.855161 NaN NaN
6 0.153778 -1.255268 0.855161 NaN NaN

If we add s1 with df12, the addition is done once for each line. This is called broadcasting. By default, the arithmetic between the DataFrame and the series corresponds to the index of the series in the columns of the DataFrame, with the rows being broadcast down.

If an index value is found neither in the columns of the DataFrame nor in the index of the series, the objects are re-indexed to form the union:

If instead you want to transfer the columns and match the rows, you must use one of the arithmetic methods, for example:

[9]:
df12.add(s2, axis="index")
[9]:
0 1 2
0 1.856994 2.578079 2.071096
1 -1.395838 1.006602 -1.672906
2 -3.744354 -1.725408 -2.684148
3 -0.239294 1.366814 -0.963576
4 -1.067525 1.450751 -0.257484
5 0.005172 -1.490822 -0.024850
6 -0.612072 0.424029 -0.103941

The axis number you pass is the axis to be aligned to. In this case, the row index of the DataFrame (axis='index' or axis=0) is to be adjusted and transmitted.

Function application and mapping#

numpy.ufunc (element-wise array methods) also work with pandas objects:

[10]:
np.abs(df12)
[10]:
0 1 2
0 0.078026 0.643059 0.136076
1 0.383531 2.018909 0.660599
2 2.770130 0.751184 1.709924
3 0.679346 0.926763 1.403627
4 1.093289 1.424987 0.283248
5 0.030022 1.465972 0.000000
6 0.508131 0.527970 0.000000

Another common operation is to apply a function to one-dimensional arrays on each column or row. The pandas.DataFrame.apply method does just that:

[11]:
df12
[11]:
0 1 2
0 -0.078026 0.643059 0.136076
1 -0.383531 2.018909 -0.660599
2 -2.770130 -0.751184 -1.709924
3 -0.679346 0.926763 -1.403627
4 -1.093289 1.424987 -0.283248
5 0.030022 -1.465972 0.000000
6 -0.508131 0.527970 0.000000
[12]:
f = lambda x: x.max() - x.min()

df12.apply(f)
[12]:
0    2.800152
1    3.484882
2    1.846000
dtype: float64

Here the function f, which calculates the difference between the maximum and minimum of a row, is called once for each column of the frame. The result is a row with the columns of the frame as index.

If you pass axis='columns' to apply, the function will be called once per line instead:

[13]:
df12.apply(f, axis="columns")
[13]:
0    0.721086
1    2.679508
2    2.018946
3    2.330389
4    2.518277
5    1.495994
6    1.036101
dtype: float64

Many of the most common array statistics (such as sum and mean) are DataFrame methods, so the use of apply is not necessary.

The function passed to apply does not have to return a single value; it can also return a series with multiple values:

[14]:
def f(x):
    return pd.Series([x.min(), x.max()], index=["min", "max"])

df12.apply(f)
[14]:
0 1 2
min -2.770130 -1.465972 -1.709924
max 0.030022 2.018909 0.136076

You can also use element-wise Python functions. Suppose you want to round each floating point value in df12 to two decimal places, you can do this with pandas.DataFrame.applymap:

[15]:
f = lambda x: round(x, 2)

df12.applymap(f)
[15]:
0 1 2
0 -0.08 0.64 0.14
1 -0.38 2.02 -0.66
2 -2.77 -0.75 -1.71
3 -0.68 0.93 -1.40
4 -1.09 1.42 -0.28
5 0.03 -1.47 0.00
6 -0.51 0.53 0.00

The reason for the name applymap is that Series has a map method for applying an element-wise function:

[16]:
df12[2].map(f)
[16]:
0    0.14
1   -0.66
2   -1.71
3   -1.40
4   -0.28
5    0.00
6    0.00
Name: 2, dtype: float64