{ "cells": [ { "cell_type": "markdown", "id": "42a41581", "metadata": {}, "source": [ "# Mathematical and statistical methods\n", "\n", "A number of mathematical functions that calculate statistics over an entire array or over the data along an axis are accessible as methods of the array class. So you can use aggregations such as sum, mean and standard deviation by either calling the array instance method or using the top-level NumPy function." ] }, { "cell_type": "markdown", "id": "24456b2b", "metadata": {}, "source": [ "Below I generate some random data and calculate some aggregated statistics:" ] }, { "cell_type": "code", "execution_count": 1, "id": "ac1f14e1", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0.16673777, 0.98120647, -0.27075733],\n", " [-1.59965964, -0.440924 , 0.63509529],\n", " [-0.52089534, -1.50217927, 0.66333717],\n", " [-0.57253028, 0.91016243, -0.62618228],\n", " [ 0.68603489, 0.30021146, -0.68244562],\n", " [ 1.64629059, 1.82493142, 0.50776089],\n", " [-0.41753316, 0.8941741 , 0.35598082]])" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "\n", "\n", "data = np.random.randn(7, 3)\n", "\n", "data" ] }, { "cell_type": "code", "execution_count": 2, "id": "c694d0ed", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "np.float64(0.13994363775784308)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.mean()" ] }, { "cell_type": "code", "execution_count": 3, "id": "338392bf", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "np.float64(0.13994363775784308)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.mean(data)" ] }, { "cell_type": "code", "execution_count": 4, "id": "fd18c5b6", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "np.float64(2.9388163929147044)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.sum()" ] }, { "cell_type": "markdown", "id": "fc22e335", "metadata": {}, "source": [ "Functions like `mean` and `sum` require an optional axis argument that calculates the statistic over the specified axis, resulting in an array with one less dimension:" ] }, { "cell_type": "code", "execution_count": 5, "id": "4bdb3b7a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([-0.08736502, 0.42394037, 0.08325556])" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.mean(axis=0)" ] }, { "cell_type": "code", "execution_count": 6, "id": "1148aec6", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([-0.61155517, 2.96758262, 0.58278894])" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.sum(axis=0)" ] }, { "cell_type": "markdown", "id": "0aaeb897", "metadata": {}, "source": [ "With `data.mean(0)`, which is the same as `data.mean(axis=0)`, the mean is calculated over the rows, while `data.sum(0)` calculates the sum over the rows." ] }, { "cell_type": "markdown", "id": "0c44ca98", "metadata": {}, "source": [ "Other methods like `cumsum` and `cumprod`, however, do not aggregate but create a new array with the intermediate results.\n", "\n", "In multidimensional arrays, accumulation functions such as `cumsum` and `cumprod` return an array of the same size but with the partial aggregates calculated along the specified axis:" ] }, { "cell_type": "code", "execution_count": 7, "id": "ae7947c0", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0.16673777, 1.14794423, 0.8771869 , -0.72247274, -1.16339674,\n", " -0.52830145, -1.04919679, -2.55137605, -1.88803888, -2.46056916,\n", " -1.55040673, -2.17658901, -1.49055412, -1.19034265, -1.87278827,\n", " -0.22649768, 1.59843374, 2.10619463, 1.68866148, 2.58283558,\n", " 2.93881639])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.cumsum()" ] }, { "cell_type": "code", "execution_count": 8, "id": "b0e8af79", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1.66737767e-01, 1.63604175e-01, -4.42970305e-02, 7.08601720e-02,\n", " -3.12439504e-02, -1.98428858e-02, 1.03360667e-02, -1.55266251e-02,\n", " -1.02993876e-02, 5.89671129e-03, 5.36696509e-03, -3.36069844e-03,\n", " -2.30555639e-03, -6.92154459e-04, 4.72357778e-04, 7.77638167e-04,\n", " 1.41913632e-03, 7.20581930e-04, -3.00866850e-04, -2.69027345e-04,\n", " -9.57685740e-05])" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.cumprod()" ] }, { "cell_type": "markdown", "id": "14560b69", "metadata": {}, "source": [ "Basic statistical methods for arrays are:\n", "\n", "Method | Description\n", ":----- | :----------\n", "`sum` | Sum of all elements in the array or along an axis.\n", "`mean` | Arithmetic mean; for arrays with length zero, `NaN` is returned.\n", "`std`, `var` | Standard deviation and variance respectively\n", "`min`, `max` | Minimum and maximum\n", "`argmin`, `argmax` | Indices of the minimum and maximum elements respectively\n", "`cumsum` | Cumulative sum of the elements, starting with `0`\n", "`cumprod` | Cumulative product of the elements, starting with `1`" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.13 Kernel", "language": "python", "name": "python313" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.0" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }