Parallelise pandas#
In Enhancing performance, some possibilities are described for improving the performance of pandas. However, there are also special libraries that can parallelise the processing of data frames.
cuDF#
cuDF is a GPU DataFrame library that implements a Pandas-like API.
See also
Modin#
Modin parallelises almost the entire Pandas API. In most cases, the existing Pandas code only needs to be extended by the following import:
import modin.pandas as pd
The restrictions refer to pd.read_json
, which is only implemented for
lines=True
.
Dask#
Dask DataFrame is a
large parallel DataFrame made up of multiple Pandas DataFrames. Here, the
dask.dataframe
API is a subset of the Pandas API, although there are minor
changes.
See also