Percentile Rank of a Column in a Pandas DataFrame


Finding the percentile rank is a common operation that is used for comparison between data of a single dataset. The end result of this operation shows a certain percentage is greater than or equal to the specified percentile. For instance, suppose a student obtains a score greater than or equal to 80% of all other scores. Then, the percentile rank of that student is 80th.

To find the percentile rank of a column in a Pandas DataFrame, we can use the built-in methods named 'rank()' and 'percentile()' provided by Python.

Python Program to find Percentile Rank of a Column in Pandas

Before moving further, let's familiarize ourselves with Pandas DataFrame. It is an open-source Python library that is mainly used for data analysis and manipulation. It can handle both relational and labeled data by performing various operations on specified data, such as cleaning, filtering, grouping, aggregating and merging.

Now, it's time to dive into the example programs.

Example 1

In the following example, we will calculate the Percentile Rank with the help of the built method 'percentile()'.

Approach

  • The first step is to import pandas and numpy packages.

  • Create a DataFrame named 'df' consisting of two columns 'Name' and 'Score'.

  • Next, use the 'percentile()' method to calculate the percentile rank. We will directly apply this method to the 'Score' column, passing the column itself as both the data array and the desired percentiles. It also takes an optional argument 'method' that specifies the method to use for interpolation when the desired percentile falls between two data points. In this case, it is set to 'nearest', which means that the nearest rank will be returned.

  • In the end, assign the resulting percentiles to a new column called 'Per_Rank' and display the result using 'print()' method.

# importing packages
import pandas as pd
import numpy as np
# defining a sample DataFrame using pandas
data = {'Name': ['Ram', 'Shyam', 'Shrey', 'Mohan', 'Navya'],
      'Score': [75, 82, 68, 90, 88] }
df = pd.DataFrame(data)
# Calculating the percentile rank using numpy
df['Per_Rank'] = np.percentile(df['Score'], df['Score'], method = 'nearest')
# to show the result
print(df)

Output

    Name  Score  Per_Rank
0    Ram     75        88
1  Shyam     82        88
2  Shrey     68        88
3  Mohan     90        90
4  Navya     88        90

Example 2

The following example illustrates the use of 'rank()' method to find percentile ranks.

Approach

  • First, import the pandas package with reference name 'pd'.

  • Create a Pandas DataFrame consisting of two columns 'Name' and 'Score'.

  • Next, create a user-defined method 'percentile_rank()' along with an argument named 'column'. Inside this method, use the built-in method 'rank()' by setting the 'pct' parameter to True so that it can return the percentile ranks for the column.

  • Now, apply the 'percentile_rank()' method to the 'Score' column by passing df['Score'] as an argument and then, store the result into a new column called 'Per_Rank'.

  • In the end, display the result using 'print()' method and exit.

# importing the required package
import pandas as pd
# defining a sample DataFrame using pandas
data = {'Name': ['Ram', 'Shyam', 'Shrey', 'Mohan', 'Navya'],
      'Score': [55, 92, 68, 70, 88] }
df = pd.DataFrame(data)
# user-defined method Calculating the percentile rank
def percentile_rank(column):
   return column.rank(pct = True)
# calling the user-defined method
df['Per_Rank'] = percentile_rank(df['Score'])
# to show the result
print(df)

Output

    Name  Score  Per_Rank
0    Ram     55       0.2
1  Shyam     92       1.0
2  Shrey     68       0.4
3  Mohan     70       0.6
4  Navya     88       0.8

Example 3

In this example, we will modify the code from the previous example by defining a new column named 'Balance' and applying the rank() method to it, instead of the 'Score' column.

# importing the required package
import pandas as pd
# defining a sample DataFrame using pandas
data = {'Name': ['Ram', 'Shyam', 'Shrey', 'Mohan', 'Navya'],
      'Balance': [5500, 9200, 6800, 7000, 8800]}
df = pd.DataFrame(data)
# user-defined method Calculating the percentile rank
def percentile_rank(column):
   return column.rank(pct = True)
# calling the user-defined method
df['Per_Rank'] = percentile_rank(df['Balance'])
# to show the result
print(df)

Output

    Name  Balance  Per_Rank
0    Ram     5500       0.2
1  Shyam     9200       1.0
2  Shrey     6800       0.4
3  Mohan     7000       0.6
4  Navya     8800       0.8

Conclusion

In this article, we discussed a few approaches to calculate percentile rank including 'rank()' and 'percentile()'. We have used 'rank()' method by specifying pct = True and the percentile() method by passing name of column as an argument.

Updated on: 25-Jul-2023

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements