{ "cells": [ { "cell_type": "markdown", "id": "233327ae", "metadata": {}, "source": [ "# Sorting and ranking\n", "\n", "Sorting a record by a criterion is another important built-in function. Sorting lexicographically by row or column index is already described in the section [Reordering and sorting from levels](indexing.ipynb#Rearranging-and-Sorting-Levels). In the following we look at sorting the values with [DataFrame.sort_values](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html) and [Series.sort_values](https://pandas.pydata.org/docs/reference/api/pandas.Series.sort_values.html):" ] }, { "cell_type": "code", "execution_count": 1, "id": "0cedcd43", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "6 -0.287551\n", "5 -0.073895\n", "4 0.077808\n", "3 0.647918\n", "2 1.370572\n", "1 -0.071934\n", "0 0.823556\n", "dtype: float64" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "import pandas as pd\n", "\n", "\n", "rng = np.random.default_rng()\n", "s = pd.Series(rng.normal(size=7))\n", "\n", "s.sort_index(ascending=False)" ] }, { "cell_type": "markdown", "id": "18623a6d", "metadata": {}, "source": [ "All missing values are sorted to the end of the row by default:" ] }, { "cell_type": "code", "execution_count": 2, "id": "6a93706c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5 0.502380\n", "3 1.347849\n", "4 1.488811\n", "0 NaN\n", "1 NaN\n", "2 NaN\n", "6 NaN\n", "dtype: float64" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s = pd.Series(rng.normal(size=7))\n", "s[s < 0] = np.nan\n", "\n", "s.sort_values()" ] }, { "cell_type": "markdown", "id": "45cf8b25", "metadata": {}, "source": [ "With a DataFrame you can sort on both axes. With `by` you specify which column or row is to be sorted:" ] }, { "cell_type": "code", "execution_count": 3, "id": "a767ca78", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
012
1-0.122280-0.0135531.622476
2-0.3166630.8231170.678331
50.545206-1.6857770.533224
40.6616170.054888-0.228683
6-0.3686101.419950-0.467401
00.7018850.046049-1.685828
30.5372441.251408-2.482741
\n", "
" ], "text/plain": [ " 0 1 2\n", "1 -0.122280 -0.013553 1.622476\n", "2 -0.316663 0.823117 0.678331\n", "5 0.545206 -1.685777 0.533224\n", "4 0.661617 0.054888 -0.228683\n", "6 -0.368610 1.419950 -0.467401\n", "0 0.701885 0.046049 -1.685828\n", "3 0.537244 1.251408 -2.482741" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame(rng.normal(size=(7, 3)))\n", "\n", "df.sort_values(by=2, ascending=False)" ] }, { "cell_type": "markdown", "id": "36902f51", "metadata": {}, "source": [ "You can also sort rows with `axis=1` and `by`:" ] }, { "cell_type": "code", "execution_count": 4, "id": "ff545cff", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
012
00.7018850.046049-1.685828
1-0.122280-0.0135531.622476
2-0.3166630.8231170.678331
30.5372441.251408-2.482741
40.6616170.054888-0.228683
50.545206-1.6857770.533224
6-0.3686101.419950-0.467401
\n", "
" ], "text/plain": [ " 0 1 2\n", "0 0.701885 0.046049 -1.685828\n", "1 -0.122280 -0.013553 1.622476\n", "2 -0.316663 0.823117 0.678331\n", "3 0.537244 1.251408 -2.482741\n", "4 0.661617 0.054888 -0.228683\n", "5 0.545206 -1.685777 0.533224\n", "6 -0.368610 1.419950 -0.467401" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sort_values(axis=1, by=[0, 1], ascending=False)" ] }, { "cell_type": "markdown", "id": "58281d1d", "metadata": {}, "source": [ "## Ranking\n", "\n", "[DataFrame.rank](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rank.html) and [Series.rank](https://pandas.pydata.org/docs/reference/api/pandas.Series.rank.html) assign ranks from one to the number of valid data points in an array:" ] }, { "cell_type": "code", "execution_count": 5, "id": "df8f2e42", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
012
07.03.02.0
13.02.07.0
22.05.06.0
34.06.01.0
46.04.04.0
55.01.05.0
61.07.03.0
\n", "
" ], "text/plain": [ " 0 1 2\n", "0 7.0 3.0 2.0\n", "1 3.0 2.0 7.0\n", "2 2.0 5.0 6.0\n", "3 4.0 6.0 1.0\n", "4 6.0 4.0 4.0\n", "5 5.0 1.0 5.0\n", "6 1.0 7.0 3.0" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.rank()" ] }, { "cell_type": "markdown", "id": "72195d0f", "metadata": {}, "source": [ "If ties occur in the ranking, the middle rank is usually assigned in each group." ] }, { "cell_type": "code", "execution_count": 6, "id": "513bca91", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
012
09.04.02.0
14.03.09.0
23.06.08.0
35.07.01.0
48.05.05.0
56.51.56.5
61.58.53.5
56.51.56.5
61.58.53.5
\n", "
" ], "text/plain": [ " 0 1 2\n", "0 9.0 4.0 2.0\n", "1 4.0 3.0 9.0\n", "2 3.0 6.0 8.0\n", "3 5.0 7.0 1.0\n", "4 8.0 5.0 5.0\n", "5 6.5 1.5 6.5\n", "6 1.5 8.5 3.5\n", "5 6.5 1.5 6.5\n", "6 1.5 8.5 3.5" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2 = pd.concat([df, df[5:]])\n", "\n", "df2.rank()" ] }, { "cell_type": "markdown", "id": "fbbb9fb2", "metadata": {}, "source": [ "The parameter `min`, on the other hand, assigns the smallest rank in the group:" ] }, { "cell_type": "code", "execution_count": 7, "id": "808a1f0f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
012
09.04.02.0
14.03.09.0
23.06.08.0
35.07.01.0
48.05.05.0
56.01.06.0
61.08.03.0
56.01.06.0
61.08.03.0
\n", "
" ], "text/plain": [ " 0 1 2\n", "0 9.0 4.0 2.0\n", "1 4.0 3.0 9.0\n", "2 3.0 6.0 8.0\n", "3 5.0 7.0 1.0\n", "4 8.0 5.0 5.0\n", "5 6.0 1.0 6.0\n", "6 1.0 8.0 3.0\n", "5 6.0 1.0 6.0\n", "6 1.0 8.0 3.0" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.rank(method=\"min\")" ] }, { "cell_type": "markdown", "id": "c285b40b", "metadata": {}, "source": [ "## Other methods with `rank`\n", "\n", "Method | Description\n", ":----- | :----------\n", "`average` | default: assign the average rank to each entry in the same group\n", "`min` | uses the minimum rank for the whole group\n", "`max` | uses the maximum rank for the whole group\n", "`first` | assigns the ranks in the order in which the values appear in the data\n", "`dense` | like `method='min'` but the ranks always increase by 1 between groups and not according to the number of same items in a group" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.13 Kernel", "language": "python", "name": "python313" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.0" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }