File input and output with arrays#
NumPy is able to store data in some text or binary formats on disk and load it from there. However, in this section I only discuss NumPy’s own binary format, as mostly pandas or other tools are used to load text or table data (see Read, persist and provide data.
np.save
and np.load
are the two most important functions for efficiently saving and loading array data to disk. Arrays are saved by default in an uncompressed raw binary format with the file extension .npy
:
[1]:
import numpy as np
data = np.random.randn(7, 3)
np.save("my_data", data)
If the file path does not already end in .npy
, the extension is appended. The array on the hard disk can then be loaded with np.load
:
[2]:
np.load("my_data.npy")
[2]:
array([[ 1.71143962, 1.06249012, 0.40089528],
[-1.93836029, 0.60398033, -0.6708609 ],
[ 0.24042536, -0.86181626, 0.33594052],
[-1.41716277, 2.11203343, -0.09469748],
[-0.36027506, 0.53376748, 1.302226 ],
[ 0.24560584, 1.29705793, 0.49696571],
[ 0.04375581, 0.88412494, -2.22439157]])
You can save multiple arrays in an uncompressed archive by using np.savez
and passing the arrays as keyword arguments:
[3]:
np.savez("data_archive.npz", a=data, b=np.square(data))
[4]:
archive = np.load("data_archive.npz")
archive["b"]
[4]:
array([[2.92902558e+00, 1.12888526e+00, 1.60717029e-01],
[3.75724062e+00, 3.64792237e-01, 4.50054349e-01],
[5.78043555e-02, 7.42727271e-01, 1.12856032e-01],
[2.00835032e+00, 4.46068522e+00, 8.96761189e-03],
[1.29798116e-01, 2.84907727e-01, 1.69579255e+00],
[6.03222306e-02, 1.68235927e+00, 2.46974919e-01],
[1.91457098e-03, 7.81676918e-01, 4.94791787e+00]])