Mathematical and statistical methods#

A number of mathematical functions that calculate statistics over an entire array or over the data along an axis are accessible as methods of the array class. So you can use aggregations such as sum, mean and standard deviation by either calling the array instance method or using the top-level NumPy function.

Below I generate some random data and calculate some aggregated statistics:

[1]:
import numpy as np


data = np.random.randn(7, 3)

data
[1]:
array([[ 0.52892401, -0.82705139, -0.13426779],
       [-0.43476595,  0.15431376, -0.15927356],
       [ 0.5437757 , -0.27273503, -0.74511308],
       [ 0.41921053,  0.78804831, -1.39898524],
       [-0.08745354,  0.24346498,  0.5995653 ],
       [ 2.18987033,  0.07709088,  0.81486999],
       [ 0.42570339,  1.23702332,  1.12807273]])
[2]:
data.mean()
[2]:
0.24239465071821545
[3]:
np.mean(data)
[3]:
0.24239465071821545
[4]:
data.sum()
[4]:
5.090287665082524

Functions like mean and sum require an optional axis argument that calculates the statistic over the specified axis, resulting in an array with one less dimension:

[5]:
data.mean(axis=0)
[5]:
array([0.51218064, 0.20002212, 0.01498119])
[6]:
data.sum(axis=0)
[6]:
array([3.58526448, 1.40015484, 0.10486835])

With data.mean(0), which is the same as data.mean(axis=0), the mean is calculated over the rows, while data.sum(0) calculates the sum over the rows.

Other methods like cumsum and cumprod, however, do not aggregate but create a new array with the intermediate results.

In multidimensional arrays, accumulation functions such as cumsum and cumprod return an array of the same size but with the partial aggregates calculated along the specified axis:

[7]:
data.cumsum()
[7]:
array([ 0.52892401, -0.29812737, -0.43239516, -0.86716111, -0.71284735,
       -0.87212091, -0.32834522, -0.60108025, -1.34619332, -0.92698279,
       -0.13893449, -1.53791972, -1.62537326, -1.38190829, -0.78234299,
        1.40752735,  1.48461823,  2.29948822,  2.72519162,  3.96221494,
        5.09028767])
[8]:
data.cumprod()
[8]:
array([ 5.28924012e-01, -4.37447338e-01,  5.87350864e-02, -2.55360156e-02,
       -3.94055863e-03,  6.27626816e-04,  3.41288209e-04, -9.30812494e-05,
        6.93560562e-05,  2.90747892e-05,  2.29123384e-05, -3.20540232e-05,
        2.80323775e-06,  6.82490215e-07,  4.09197451e-07,  8.96089358e-07,
        6.90803200e-08,  5.62914796e-08,  2.39634740e-08,  2.96433762e-08,
        3.34398842e-08])

Basic statistical methods for arrays are:

Method

Description

sum

Sum of all elements in the array or along an axis.

mean

Arithmetic mean; for arrays with length zero, NaN is returned.

std, var

Standard deviation and variance respectively

min, max

Minimum and maximum

argmin, argmax

Indices of the minimum and maximum elements respectively

cumsum

Cumulative sum of the elements, starting with 0

cumprod

Cumulative product of the elements, starting with 1