{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "233327ae",
   "metadata": {},
   "source": [
    "# Sorting and ranking\n",
    "\n",
    "Sorting a record by a criterion is another important built-in function. Sorting lexicographically by row or column index is already described in the section [Reordering and sorting from levels](indexing.ipynb#Rearranging-and-Sorting-Levels). In the following we look at sorting the values with [DataFrame.sort_values](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html) and [Series.sort_values](https://pandas.pydata.org/docs/reference/api/pandas.Series.sort_values.html):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "0cedcd43",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6   -0.287551\n",
       "5   -0.073895\n",
       "4    0.077808\n",
       "3    0.647918\n",
       "2    1.370572\n",
       "1   -0.071934\n",
       "0    0.823556\n",
       "dtype: float64"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "\n",
    "rng = np.random.default_rng()\n",
    "s = pd.Series(rng.normal(size=7))\n",
    "\n",
    "s.sort_index(ascending=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18623a6d",
   "metadata": {},
   "source": [
    "All missing values are sorted to the end of the row by default:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "6a93706c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5    0.502380\n",
       "3    1.347849\n",
       "4    1.488811\n",
       "0         NaN\n",
       "1         NaN\n",
       "2         NaN\n",
       "6         NaN\n",
       "dtype: float64"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = pd.Series(rng.normal(size=7))\n",
    "s[s < 0] = np.nan\n",
    "\n",
    "s.sort_values()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "45cf8b25",
   "metadata": {},
   "source": [
    "With a DataFrame you can sort on both axes. With `by` you specify which column or row is to be sorted:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "a767ca78",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-0.122280</td>\n",
       "      <td>-0.013553</td>\n",
       "      <td>1.622476</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-0.316663</td>\n",
       "      <td>0.823117</td>\n",
       "      <td>0.678331</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.545206</td>\n",
       "      <td>-1.685777</td>\n",
       "      <td>0.533224</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.661617</td>\n",
       "      <td>0.054888</td>\n",
       "      <td>-0.228683</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>-0.368610</td>\n",
       "      <td>1.419950</td>\n",
       "      <td>-0.467401</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.701885</td>\n",
       "      <td>0.046049</td>\n",
       "      <td>-1.685828</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.537244</td>\n",
       "      <td>1.251408</td>\n",
       "      <td>-2.482741</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          0         1         2\n",
       "1 -0.122280 -0.013553  1.622476\n",
       "2 -0.316663  0.823117  0.678331\n",
       "5  0.545206 -1.685777  0.533224\n",
       "4  0.661617  0.054888 -0.228683\n",
       "6 -0.368610  1.419950 -0.467401\n",
       "0  0.701885  0.046049 -1.685828\n",
       "3  0.537244  1.251408 -2.482741"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame(rng.normal(size=(7, 3)))\n",
    "\n",
    "df.sort_values(by=2, ascending=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "36902f51",
   "metadata": {},
   "source": [
    "You can also sort rows with `axis=1` and `by`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "ff545cff",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.701885</td>\n",
       "      <td>0.046049</td>\n",
       "      <td>-1.685828</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-0.122280</td>\n",
       "      <td>-0.013553</td>\n",
       "      <td>1.622476</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-0.316663</td>\n",
       "      <td>0.823117</td>\n",
       "      <td>0.678331</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.537244</td>\n",
       "      <td>1.251408</td>\n",
       "      <td>-2.482741</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.661617</td>\n",
       "      <td>0.054888</td>\n",
       "      <td>-0.228683</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.545206</td>\n",
       "      <td>-1.685777</td>\n",
       "      <td>0.533224</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>-0.368610</td>\n",
       "      <td>1.419950</td>\n",
       "      <td>-0.467401</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          0         1         2\n",
       "0  0.701885  0.046049 -1.685828\n",
       "1 -0.122280 -0.013553  1.622476\n",
       "2 -0.316663  0.823117  0.678331\n",
       "3  0.537244  1.251408 -2.482741\n",
       "4  0.661617  0.054888 -0.228683\n",
       "5  0.545206 -1.685777  0.533224\n",
       "6 -0.368610  1.419950 -0.467401"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.sort_values(axis=1, by=[0, 1], ascending=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "58281d1d",
   "metadata": {},
   "source": [
    "## Ranking\n",
    "\n",
    "[DataFrame.rank](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rank.html) and [Series.rank](https://pandas.pydata.org/docs/reference/api/pandas.Series.rank.html) assign ranks from one to the number of valid data points in an array:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "df8f2e42",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>7.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>7.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>6.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>5.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     0    1    2\n",
       "0  7.0  3.0  2.0\n",
       "1  3.0  2.0  7.0\n",
       "2  2.0  5.0  6.0\n",
       "3  4.0  6.0  1.0\n",
       "4  6.0  4.0  4.0\n",
       "5  5.0  1.0  5.0\n",
       "6  1.0  7.0  3.0"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.rank()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "72195d0f",
   "metadata": {},
   "source": [
    "If ties occur in the ranking, the middle rank is usually assigned in each group."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "513bca91",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>9.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>5.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>8.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>6.5</td>\n",
       "      <td>1.5</td>\n",
       "      <td>6.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1.5</td>\n",
       "      <td>8.5</td>\n",
       "      <td>3.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>6.5</td>\n",
       "      <td>1.5</td>\n",
       "      <td>6.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1.5</td>\n",
       "      <td>8.5</td>\n",
       "      <td>3.5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     0    1    2\n",
       "0  9.0  4.0  2.0\n",
       "1  4.0  3.0  9.0\n",
       "2  3.0  6.0  8.0\n",
       "3  5.0  7.0  1.0\n",
       "4  8.0  5.0  5.0\n",
       "5  6.5  1.5  6.5\n",
       "6  1.5  8.5  3.5\n",
       "5  6.5  1.5  6.5\n",
       "6  1.5  8.5  3.5"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2 = pd.concat([df, df[5:]])\n",
    "\n",
    "df2.rank()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fbbb9fb2",
   "metadata": {},
   "source": [
    "The parameter `min`, on the other hand, assigns the smallest rank in the group:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "808a1f0f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>9.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>5.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>8.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>6.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>6.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     0    1    2\n",
       "0  9.0  4.0  2.0\n",
       "1  4.0  3.0  9.0\n",
       "2  3.0  6.0  8.0\n",
       "3  5.0  7.0  1.0\n",
       "4  8.0  5.0  5.0\n",
       "5  6.0  1.0  6.0\n",
       "6  1.0  8.0  3.0\n",
       "5  6.0  1.0  6.0\n",
       "6  1.0  8.0  3.0"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.rank(method=\"min\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c285b40b",
   "metadata": {},
   "source": [
    "## Other methods with `rank`\n",
    "\n",
    "Method | Description\n",
    ":----- | :----------\n",
    "`average` | default: assign the average rank to each entry in the same group\n",
    "`min` | uses the minimum rank for the whole group\n",
    "`max` | uses the maximum rank for the whole group\n",
    "`first` | assigns the ranks in the order in which the values appear in the data\n",
    "`dense` | like `method='min'` but the ranks always increase by 1 between groups and not according to the number of same items in a group"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.13 Kernel",
   "language": "python",
   "name": "python313"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.0"
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}