Pandas: Series - rank() function

Last update on September 15 2022 12:55:39 (UTC/GMT +8 hours)

Compute numerical data ranks along axis

The rank() function is used to compute numerical data ranks (1 through n) along axis.

By default, equal values are assigned a rank that is the average of the ranks of those values.

Syntax:

Series.rank(self, axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)

Parameters:

Name	Description	Type/Default Value	Required / Optional
axis	Index to direct ranking.	{0 or ‘index’, 1 or ‘columns’} Default Value: 0	Required
method	How to rank the group of records that have the same value (i.e. ties): average: average rank of the group min: lowest rank in the group max: highest rank in the group first: ranks assigned in order they appear in the array dense: like ‘min’, but rank always increases by 1 between groups	{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’} Default Value: ‘average’	Required
numeric_only	For DataFrame objects, rank only numeric columns if set to True.	bool	Optional
na_option	How to rank NaN values: keep: assign NaN rank to NaN values top: assign smallest rank to NaN values if ascending bottom: assign highest rank to NaN values if ascending	{‘keep’, ‘top’, ‘bottom’} Default Value: ‘keep’	Required
ascending	Whether or not the elements should be ranked in ascending order.	bool Default Value: True	Required
pct	Whether or not to display the returned rankings in percentile form.	bool Default Value: False	Required

Returns: same type as caller
Return a Series or DataFrame with data ranks as values.

Example:

Python-Pandas Code:

import numpy as np
import pandas as pd
df = pd.DataFrame(data={'Animal': ['lion', 'fox', 'cow',
                                   'spider', 'snake'],
                        'Number_legs': [4, 4, 4, 8, np.nan]})
df

Output:

  Animal	Number_legs
0	lion	    4.0
1	fox	        4.0
2	cow	        4.0
3	spider	    8.0
4	snake	    NaN

The following example shows how the method behaves with the above parameters:

default_rank: this is the default behaviour obtained without using any parameter.
max_rank: setting method = 'max' the records that have the same values are ranked using the highest rank (e.g.: since ‘lion’ and ‘cow’ are both in the 2nd and 3rd position, rank 3 is assigned.)
NA_bottom: choosing na_option = 'bottom', if there are records with NaN values they are placed at the bottom of the ranking.
pct_rank: when setting pct = True, the ranking is expressed as percentile rank.

Python-Pandas Code:

import numpy as np
import pandas as pd
df = pd.DataFrame(data={'Animal': ['lion', 'fox', 'cow',
                                   'spider', 'snake'],
                        'Number_legs': [4, 4, 4, 8, np.nan]})
df['default_rank'] = df['Number_legs'].rank()
df['max_rank'] = df['Number_legs'].rank(method='max')
df['NA_bottom'] = df['Number_legs'].rank(na_option='bottom')
df['pct_rank'] = df['Number_legs'].rank(pct=True)
df

Output:

  Animal	Number_legs	default_rank	max_rank	NA_bottom	pct_rank
0	lion	4.0	           2.0	          3.0	         2.0	0.5
1	fox	    4.0	           2.0	          3.0	         2.0	0.5
2	cow	    4.0	           2.0	          3.0	         2.0	0.5
3	spider  8.0	           4.0	          4.0	         4.0	1.0
4	snake	NaN	           NaN	          NaN	         5.0	NaN

Previous: Value at the given quantile
Next: Sum of the values for the requested axis in Pandas