Parallelise pandas

In Enhancing performance, some possibilities are described for improving the performance of pandas. However, there are also special libraries that can parallelise the processing of data frames.


cuDF is a GPU DataFrame library that implements a pandas-like API.


Modin parallelises almost the entire Pandas API. In most cases, the existing Pandas code only needs to be extended by the following import:

import modin.pandas as pd

The restrictions refer to pd.read_json, which is only implemented for lines=True.

See also


Dask DataFrame is a large parallel DataFrame made up of multiple pandas DataFrames. Here, the dask.dataframe API is a subset of the pandas API, although there are minor changes.