Parallelise pandas¶
In Enhancing performance, some possibilities are described for improving the performance of pandas. However, there are also special libraries that can parallelise the processing of data frames.
cuDF¶
cuDF is a GPU DataFrame library that implements a pandas-like API.
See also
Modin¶
Modin parallelises almost the entire Pandas API. In most cases, the existing Pandas code only needs to be extended by the following import:
import modin.pandas as pd
The restrictions refer to pd.read_json
, which is only implemented for
lines=True
.
Dask¶
Dask DataFrame is a large parallel DataFrame made
up of multiple pandas DataFrames. Here, the dask.dataframe
API is a subset
of the pandas API, although there are minor changes.
See also