.. SPDX-FileCopyrightText: 2021 Veit Schiele
..
.. SPDX-License-Identifier: BSD-3-Clause
Parallelise pandas
==================
In :doc:`pandas:user_guide/enhancingperf`, some possibilities are described for
improving the performance of pandas. However, there are also special libraries
that can parallelise the processing of data frames.
cuDF
----
cuDF is a GPU DataFrame library that implements a `pandas-like API
`_.
.. seealso::
* `Docs `__
* `GitHub `__
* `PyPI `_
* `Example notebooks
`_
Modin
-----
Modin parallelises almost the entire Pandas API. In most cases, the existing
Pandas code only needs to be extended by the following import:
.. code-block:: python
import modin.pandas as pd
The restrictions refer to ``pd.read_json``, which is only implemented for
``lines=True``.
.. seealso::
* `Docs `__
* `GitHub `__
Dask
----
:ref:`/performance/dask.ipynb#dask-dataframe` is a large parallel DataFrame made
up of multiple pandas DataFrames. Here, the ``dask.dataframe`` API is a subset
of the pandas API, although there are minor changes.
.. seealso::
* `Home `_
* `API docs `_
* `Example notebook `_
* `Tutorial `_