{
"cells": [
{
"cell_type": "markdown",
"id": "233327ae",
"metadata": {},
"source": [
"# Sorting and ranking\n",
"\n",
"Sorting a record by a criterion is another important built-in function. Sorting lexicographically by row or column index is already described in the section [Reordering and sorting from levels](indexing.ipynb#Rearranging-and-Sorting-Levels). In the following we look at sorting the values with [DataFrame.sort_values](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html) and [Series.sort_values](https://pandas.pydata.org/docs/reference/api/pandas.Series.sort_values.html):"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "0cedcd43",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"6 -0.287551\n",
"5 -0.073895\n",
"4 0.077808\n",
"3 0.647918\n",
"2 1.370572\n",
"1 -0.071934\n",
"0 0.823556\n",
"dtype: float64"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"\n",
"rng = np.random.default_rng()\n",
"s = pd.Series(rng.normal(size=7))\n",
"\n",
"s.sort_index(ascending=False)"
]
},
{
"cell_type": "markdown",
"id": "18623a6d",
"metadata": {},
"source": [
"All missing values are sorted to the end of the row by default:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "6a93706c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"5 0.502380\n",
"3 1.347849\n",
"4 1.488811\n",
"0 NaN\n",
"1 NaN\n",
"2 NaN\n",
"6 NaN\n",
"dtype: float64"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s = pd.Series(rng.normal(size=7))\n",
"s[s < 0] = np.nan\n",
"\n",
"s.sort_values()"
]
},
{
"cell_type": "markdown",
"id": "45cf8b25",
"metadata": {},
"source": [
"With a DataFrame you can sort on both axes. With `by` you specify which column or row is to be sorted:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a767ca78",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
"
\n",
" \n",
" \n",
" \n",
" | 1 | \n",
" -0.122280 | \n",
" -0.013553 | \n",
" 1.622476 | \n",
"
\n",
" \n",
" | 2 | \n",
" -0.316663 | \n",
" 0.823117 | \n",
" 0.678331 | \n",
"
\n",
" \n",
" | 5 | \n",
" 0.545206 | \n",
" -1.685777 | \n",
" 0.533224 | \n",
"
\n",
" \n",
" | 4 | \n",
" 0.661617 | \n",
" 0.054888 | \n",
" -0.228683 | \n",
"
\n",
" \n",
" | 6 | \n",
" -0.368610 | \n",
" 1.419950 | \n",
" -0.467401 | \n",
"
\n",
" \n",
" | 0 | \n",
" 0.701885 | \n",
" 0.046049 | \n",
" -1.685828 | \n",
"
\n",
" \n",
" | 3 | \n",
" 0.537244 | \n",
" 1.251408 | \n",
" -2.482741 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 0 1 2\n",
"1 -0.122280 -0.013553 1.622476\n",
"2 -0.316663 0.823117 0.678331\n",
"5 0.545206 -1.685777 0.533224\n",
"4 0.661617 0.054888 -0.228683\n",
"6 -0.368610 1.419950 -0.467401\n",
"0 0.701885 0.046049 -1.685828\n",
"3 0.537244 1.251408 -2.482741"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.DataFrame(rng.normal(size=(7, 3)))\n",
"\n",
"df.sort_values(by=2, ascending=False)"
]
},
{
"cell_type": "markdown",
"id": "36902f51",
"metadata": {},
"source": [
"You can also sort rows with `axis=1` and `by`:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "ff545cff",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 0.701885 | \n",
" 0.046049 | \n",
" -1.685828 | \n",
"
\n",
" \n",
" | 1 | \n",
" -0.122280 | \n",
" -0.013553 | \n",
" 1.622476 | \n",
"
\n",
" \n",
" | 2 | \n",
" -0.316663 | \n",
" 0.823117 | \n",
" 0.678331 | \n",
"
\n",
" \n",
" | 3 | \n",
" 0.537244 | \n",
" 1.251408 | \n",
" -2.482741 | \n",
"
\n",
" \n",
" | 4 | \n",
" 0.661617 | \n",
" 0.054888 | \n",
" -0.228683 | \n",
"
\n",
" \n",
" | 5 | \n",
" 0.545206 | \n",
" -1.685777 | \n",
" 0.533224 | \n",
"
\n",
" \n",
" | 6 | \n",
" -0.368610 | \n",
" 1.419950 | \n",
" -0.467401 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 0 1 2\n",
"0 0.701885 0.046049 -1.685828\n",
"1 -0.122280 -0.013553 1.622476\n",
"2 -0.316663 0.823117 0.678331\n",
"3 0.537244 1.251408 -2.482741\n",
"4 0.661617 0.054888 -0.228683\n",
"5 0.545206 -1.685777 0.533224\n",
"6 -0.368610 1.419950 -0.467401"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.sort_values(axis=1, by=[0, 1], ascending=False)"
]
},
{
"cell_type": "markdown",
"id": "58281d1d",
"metadata": {},
"source": [
"## Ranking\n",
"\n",
"[DataFrame.rank](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rank.html) and [Series.rank](https://pandas.pydata.org/docs/reference/api/pandas.Series.rank.html) assign ranks from one to the number of valid data points in an array:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "df8f2e42",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 7.0 | \n",
" 3.0 | \n",
" 2.0 | \n",
"
\n",
" \n",
" | 1 | \n",
" 3.0 | \n",
" 2.0 | \n",
" 7.0 | \n",
"
\n",
" \n",
" | 2 | \n",
" 2.0 | \n",
" 5.0 | \n",
" 6.0 | \n",
"
\n",
" \n",
" | 3 | \n",
" 4.0 | \n",
" 6.0 | \n",
" 1.0 | \n",
"
\n",
" \n",
" | 4 | \n",
" 6.0 | \n",
" 4.0 | \n",
" 4.0 | \n",
"
\n",
" \n",
" | 5 | \n",
" 5.0 | \n",
" 1.0 | \n",
" 5.0 | \n",
"
\n",
" \n",
" | 6 | \n",
" 1.0 | \n",
" 7.0 | \n",
" 3.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 0 1 2\n",
"0 7.0 3.0 2.0\n",
"1 3.0 2.0 7.0\n",
"2 2.0 5.0 6.0\n",
"3 4.0 6.0 1.0\n",
"4 6.0 4.0 4.0\n",
"5 5.0 1.0 5.0\n",
"6 1.0 7.0 3.0"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.rank()"
]
},
{
"cell_type": "markdown",
"id": "72195d0f",
"metadata": {},
"source": [
"If ties occur in the ranking, the middle rank is usually assigned in each group."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "513bca91",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 9.0 | \n",
" 4.0 | \n",
" 2.0 | \n",
"
\n",
" \n",
" | 1 | \n",
" 4.0 | \n",
" 3.0 | \n",
" 9.0 | \n",
"
\n",
" \n",
" | 2 | \n",
" 3.0 | \n",
" 6.0 | \n",
" 8.0 | \n",
"
\n",
" \n",
" | 3 | \n",
" 5.0 | \n",
" 7.0 | \n",
" 1.0 | \n",
"
\n",
" \n",
" | 4 | \n",
" 8.0 | \n",
" 5.0 | \n",
" 5.0 | \n",
"
\n",
" \n",
" | 5 | \n",
" 6.5 | \n",
" 1.5 | \n",
" 6.5 | \n",
"
\n",
" \n",
" | 6 | \n",
" 1.5 | \n",
" 8.5 | \n",
" 3.5 | \n",
"
\n",
" \n",
" | 5 | \n",
" 6.5 | \n",
" 1.5 | \n",
" 6.5 | \n",
"
\n",
" \n",
" | 6 | \n",
" 1.5 | \n",
" 8.5 | \n",
" 3.5 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 0 1 2\n",
"0 9.0 4.0 2.0\n",
"1 4.0 3.0 9.0\n",
"2 3.0 6.0 8.0\n",
"3 5.0 7.0 1.0\n",
"4 8.0 5.0 5.0\n",
"5 6.5 1.5 6.5\n",
"6 1.5 8.5 3.5\n",
"5 6.5 1.5 6.5\n",
"6 1.5 8.5 3.5"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df2 = pd.concat([df, df[5:]])\n",
"\n",
"df2.rank()"
]
},
{
"cell_type": "markdown",
"id": "fbbb9fb2",
"metadata": {},
"source": [
"The parameter `min`, on the other hand, assigns the smallest rank in the group:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "808a1f0f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 9.0 | \n",
" 4.0 | \n",
" 2.0 | \n",
"
\n",
" \n",
" | 1 | \n",
" 4.0 | \n",
" 3.0 | \n",
" 9.0 | \n",
"
\n",
" \n",
" | 2 | \n",
" 3.0 | \n",
" 6.0 | \n",
" 8.0 | \n",
"
\n",
" \n",
" | 3 | \n",
" 5.0 | \n",
" 7.0 | \n",
" 1.0 | \n",
"
\n",
" \n",
" | 4 | \n",
" 8.0 | \n",
" 5.0 | \n",
" 5.0 | \n",
"
\n",
" \n",
" | 5 | \n",
" 6.0 | \n",
" 1.0 | \n",
" 6.0 | \n",
"
\n",
" \n",
" | 6 | \n",
" 1.0 | \n",
" 8.0 | \n",
" 3.0 | \n",
"
\n",
" \n",
" | 5 | \n",
" 6.0 | \n",
" 1.0 | \n",
" 6.0 | \n",
"
\n",
" \n",
" | 6 | \n",
" 1.0 | \n",
" 8.0 | \n",
" 3.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 0 1 2\n",
"0 9.0 4.0 2.0\n",
"1 4.0 3.0 9.0\n",
"2 3.0 6.0 8.0\n",
"3 5.0 7.0 1.0\n",
"4 8.0 5.0 5.0\n",
"5 6.0 1.0 6.0\n",
"6 1.0 8.0 3.0\n",
"5 6.0 1.0 6.0\n",
"6 1.0 8.0 3.0"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df2.rank(method=\"min\")"
]
},
{
"cell_type": "markdown",
"id": "c285b40b",
"metadata": {},
"source": [
"## Other methods with `rank`\n",
"\n",
"Method | Description\n",
":----- | :----------\n",
"`average` | default: assign the average rank to each entry in the same group\n",
"`min` | uses the minimum rank for the whole group\n",
"`max` | uses the maximum rank for the whole group\n",
"`first` | assigns the ranks in the order in which the values appear in the data\n",
"`dense` | like `method='min'` but the ranks always increase by 1 between groups and not according to the number of same items in a group"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.13 Kernel",
"language": "python",
"name": "python313"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.0"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"state": {},
"version_major": 2,
"version_minor": 0
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}