Sorting and ranking#

Sorting a record by a criterion is another important built-in function. Sorting lexicographically by row or column index is already described in the section Reordering and sorting from levels. In the following we look at sorting the values with DataFrame.sort_values and Series.sort_values:

[1]:
import numpy as np
import pandas as pd


rng = np.random.default_rng()
s = pd.Series(rng.normal(size=7))

s.sort_index(ascending=False)
[1]:
6   -0.521271
5   -0.228255
4   -1.131139
3   -0.531495
2    0.783785
1   -0.311396
0    0.088381
dtype: float64

All missing values are sorted to the end of the row by default:

[2]:
s = pd.Series(rng.normal(size=7))
s[s < 0] = np.nan

s.sort_values()
[2]:
6    0.303859
4    0.435222
5    0.936456
3    1.312848
2    1.840338
0         NaN
1         NaN
dtype: float64

With a DataFrame you can sort on both axes. With by you specify which column or row is to be sorted:

[3]:
df = pd.DataFrame(rng.normal(size=(7, 3)))

df.sort_values(by=2, ascending=False)
[3]:
0 1 2
3 1.489694 0.104105 0.870251
6 -0.649611 -1.035134 0.515880
5 -0.176371 1.261471 0.242477
0 0.252096 -0.315417 -1.000917
2 -1.659567 -0.139293 -1.138415
4 1.533278 0.241760 -1.252604
1 1.929005 1.032325 -2.153640

You can also sort rows with axis=1 and by:

[4]:
df.sort_values(axis=1, by=[0, 1], ascending=False)
[4]:
0 1 2
0 0.252096 -0.315417 -1.000917
1 1.929005 1.032325 -2.153640
2 -1.659567 -0.139293 -1.138415
3 1.489694 0.104105 0.870251
4 1.533278 0.241760 -1.252604
5 -0.176371 1.261471 0.242477
6 -0.649611 -1.035134 0.515880

Ranking#

DataFrame.rank and Series.rank assign ranks from one to the number of valid data points in an array:

[5]:
df.rank()
[5]:
0 1 2
0 4.0 2.0 4.0
1 7.0 6.0 1.0
2 1.0 3.0 3.0
3 5.0 4.0 7.0
4 6.0 5.0 2.0
5 3.0 7.0 5.0
6 2.0 1.0 6.0

If ties occur in the ranking, the middle rank is usually assigned in each group.

[6]:
df2 = pd.concat([df, df[5:]])

df2.rank()
[6]:
0 1 2
0 6.0 3.0 4.0
1 9.0 7.0 1.0
2 1.0 4.0 3.0
3 7.0 5.0 9.0
4 8.0 6.0 2.0
5 4.5 8.5 5.5
6 2.5 1.5 7.5
5 4.5 8.5 5.5
6 2.5 1.5 7.5

The parameter min, on the other hand, assigns the smallest rank in the group:

[7]:
df2.rank(method="min")
[7]:
0 1 2
0 6.0 3.0 4.0
1 9.0 7.0 1.0
2 1.0 4.0 3.0
3 7.0 5.0 9.0
4 8.0 6.0 2.0
5 4.0 8.0 5.0
6 2.0 1.0 7.0
5 4.0 8.0 5.0
6 2.0 1.0 7.0

Other methods with rank#

Method

Description

average

default: assign the average rank to each entry in the same group

min

uses the minimum rank for the whole group

max

uses the maximum rank for the whole group

first

assigns the ranks in the order in which the values appear in the data

dense

like method='min' but the ranks always increase by 1 between groups and not according to the number of same items in a group