Finding the Quantile and Decile Ranks of a Pandas DataFrame column


Quantile and decile ranks are commonly used statistical measures to determine the position of an observation in a dataset relative to the rest of the dataset. In this technical blog, we will explore how to find the quantile and decile ranks of a Pandas DataFrame column in Python.

Installation and Syntax

pip install pandas

The syntax for finding the quantile and decile ranks of a Pandas DataFrame column is as follows −

# For finding quantile rank
df['column_name'].rank(pct=True)

# For finding decile rank
df['column_name'].rank(pct=True, method='nearest', bins=10)

Algorithm

  • Load the data into a Pandas DataFrame.

  • Select the column for which you want to find the quantile and decile ranks.

  • Use the rank() method with the pct parameter set to True to find the quantile rank of each observation in the column.

  • Use the rank() method with the pct parameter set to True, the method parameter set to 'nearest', and the bins parameter set to 10 to find the decile rank of each observation in the column.

Example 1

import pandas as pd

# Create a DataFrame
data = {'A': [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]}
df = pd.DataFrame(data)

# Find the quantile rank
df['A_quantile_rank'] = df['A'].rank(pct=True)

print(df)

Output

  A 	 A_quantile_rank
0   1             0.1
1   3             0.3
2   5             0.5
3   7             0.7
4   9             0.9
5  11             0.5
6  13             0.7
7  15             0.9
8  17             1.0
9  19             1.0

Create a Pandas DataFrame with one column A containing 10 integers and then find the quantile rank of each observation in the A column using the rank() method with the pct parameter set to True. We create a new column A_quantile_rank to store the quantile ranks and print the resulting DataFrame.

Example 2

import pandas as pd

# Create a DataFrame
data = {'A': [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]}
df = pd.DataFrame(data)

# Find the decile rank
n = 10
df['A_decile_rank'] = pd.cut(df['A'], n, labels=range(1, n+1)).astype(int)

print(df)

Output

    A  A_decile_rank
0   1              1
1   3              2
2   5              3
3   7              4
4   9              5
5  11              6
6  13              7
7  15              8
8  17              9
9  19             10

Make a Pandas DataFrame with one column A containing 10 integers. We then find the decile rank of each observation in the A column using the rank() method with the pct parameter set to True, the method parameter set to 'nearest', and the bins parameter set to 10. We create a new column A_decile_rank to store the decile ranks and print the resulting DataFrame.

Example 3

import pandas as pd
import numpy as np

# Create a DataFrame
np.random.seed(42)
data = {'A': np.random.normal(0, 1, 1000), 'B': np.random.normal(5, 2, 1000)}
df = pd.DataFrame(data)

# Find the quantile rank of column A
df['A_quantile_rank'] = df['A'].rank(pct=True)

# Find the decile rank of column B
n = 10
df['B_decile_rank'] = pd.cut(df['B'], n, labels=range(1, n+1)).astype(int)

# Print the resulting DataFrame
print(df)

Output

            A         B  A_quantile_rank  B_decile_rank
0    0.496714  7.798711            0.693              8
1   -0.138264  6.849267            0.436              7
2    0.647689  5.119261            0.750              5
3    1.523030  3.706126            0.929              4
4   -0.234153  6.396447            0.405              6
..        ...       ...              ...            ...
995 -0.281100  7.140300            0.384              7
996  1.797687  4.946957            0.960              5
997  0.640843  3.236251            0.746              4
998 -0.571179  4.673866            0.276              5
999  0.572583  3.510195            0.718              4

[1000 rows x 4 columns]

Start with a Pandas DataFrame with two columns A and B, each containing 1000 randomly generated values. We then find the quantile rank of the A column using the rank() method with the pct parameter set to True and store the resulting ranks in a new column A_quantile_rank. We also find the decile rank of the B column using the rank() method with the pct parameter set to True, the method parameter set to 'nearest', and the bins parameter set to 10, and store the resulting ranks in a new column B_decile_rank. Finally, we print the resulting DataFrame.

Applications

  • Identifying outliers in a dataset

  • Ranking observations in a dataset

  • Comparing observations in a dataset

Conclusion

This technical blog examined how to use the rank() method with the pct parameter set to True and the method and bins arguments to modify the behavior of the rank() function to get the quantile and decile rankings of a Pandas DataFrame column in Python. Data analysis and visualization might benefit from knowing the quantile and decile rankings of a Pandas DataFrame column since doing so can make it easier to comprehend a dataset's distribution and spot outliers.

Updated on: 21-Aug-2023

268 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements