.. SPDX-FileCopyrightText: 2021 Veit Schiele .. .. SPDX-License-Identifier: BSD-3-Clause Parallelise pandas ================== In :doc:`pandas:user_guide/enhancingperf`, some possibilities are described for improving the performance of pandas. However, there are also special libraries that can parallelise the processing of data frames. cuDF ---- cuDF is a GPU DataFrame library that implements a `pandas-like API `_. .. seealso:: * `Docs `__ * `GitHub `__ * `PyPI `_ * `Example notebooks `_ Modin ----- Modin parallelises almost the entire Pandas API. In most cases, the existing Pandas code only needs to be extended by the following import: .. code-block:: python import modin.pandas as pd The restrictions refer to ``pd.read_json``, which is only implemented for ``lines=True``. .. seealso:: * `Docs `__ * `GitHub `__ Dask ---- :ref:`/performance/dask.ipynb#dask-dataframe` is a large parallel DataFrame made up of multiple pandas DataFrames. Here, the ``dask.dataframe`` API is a subset of the pandas API, although there are minor changes. .. seealso:: * `Home `_ * `API docs `_ * `Example notebook `_ * `Tutorial `_