Parallelise pandas#

In Enhancing performance, some possibilities are described for improving the performance of pandas. However, there are also special libraries that can parallelise the processing of data frames.

cuDF#

cuDF is a GPU DataFrame library that implements a Pandas-like API.

Modin#

Modin parallelises almost the entire Pandas API. In most cases, the existing Pandas code only needs to be extended by the following import:

import modin.pandas as pd

The restrictions refer to pd.read_json, which is only implemented for lines=True.

See also

Dask#

Dask DataFrame is a large parallel DataFrame made up of multiple Pandas DataFrames. Here, the dask.dataframe API is a subset of the Pandas API, although there are minor changes.