NumPy is the abbreviation for numeric Python. Many Python packages that provide scientific functions use NumPy’s array objects as one of the standard interfaces for data exchange. In the following, I will give a brief overview of the main functionality of NumPy:

  • ndarray, an efficient multidimensional array that provides fast array-based operations, such as shuffling and cleaning data, subgrouping and filtering, transformation and all other kinds of computations. There are also flexible functions for broadcasting, i.e. evaluations of arrays of different sizes.

  • Mathematical functions for fast operations on whole arrays of data, such as sorting, uniqueness and set operations. Instead of loops with if-elif-else branches, the expressions are written in conditional logic.

  • Tools for reading and writing array data to disk and working with memory mapped files.

  • Functions for linear algebra, random number generation and Fourier transform.

  • A C API for connecting NumPy to libraries written in C, C++ or FORTRAN.


This section introduces you to the basics of using NumPy arrays and should be sufficient to follow the rest of the tutorial. For many data analytic applications, it is not necessary to have a deep understanding of NumPy, but mastering array-oriented programming and thinking is an important step on the way to becoming a data scientist.