- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Percentile Rank of a Column in a Pandas DataFrame
Finding the percentile rank is a common operation that is used for comparison between data of a single dataset. The end result of this operation shows a certain percentage is greater than or equal to the specified percentile. For instance, suppose a student obtains a score greater than or equal to 80% of all other scores. Then, the percentile rank of that student is 80th.
To find the percentile rank of a column in a Pandas DataFrame, we can use the built-in methods named 'rank()' and 'percentile()' provided by Python.
Python Program to find Percentile Rank of a Column in Pandas
Before moving further, let's familiarize ourselves with Pandas DataFrame. It is an open-source Python library that is mainly used for data analysis and manipulation. It can handle both relational and labeled data by performing various operations on specified data, such as cleaning, filtering, grouping, aggregating and merging.
Now, it's time to dive into the example programs.
Example 1
In the following example, we will calculate the Percentile Rank with the help of the built method 'percentile()'.
Approach
The first step is to import pandas and numpy packages.
Create a DataFrame named 'df' consisting of two columns 'Name' and 'Score'.
Next, use the 'percentile()' method to calculate the percentile rank. We will directly apply this method to the 'Score' column, passing the column itself as both the data array and the desired percentiles. It also takes an optional argument 'method' that specifies the method to use for interpolation when the desired percentile falls between two data points. In this case, it is set to 'nearest', which means that the nearest rank will be returned.
In the end, assign the resulting percentiles to a new column called 'Per_Rank' and display the result using 'print()' method.
# importing packages import pandas as pd import numpy as np # defining a sample DataFrame using pandas data = {'Name': ['Ram', 'Shyam', 'Shrey', 'Mohan', 'Navya'], 'Score': [75, 82, 68, 90, 88] } df = pd.DataFrame(data) # Calculating the percentile rank using numpy df['Per_Rank'] = np.percentile(df['Score'], df['Score'], method = 'nearest') # to show the result print(df)
Output
Name Score Per_Rank 0 Ram 75 88 1 Shyam 82 88 2 Shrey 68 88 3 Mohan 90 90 4 Navya 88 90
Example 2
The following example illustrates the use of 'rank()' method to find percentile ranks.
Approach
First, import the pandas package with reference name 'pd'.
Create a Pandas DataFrame consisting of two columns 'Name' and 'Score'.
Next, create a user-defined method 'percentile_rank()' along with an argument named 'column'. Inside this method, use the built-in method 'rank()' by setting the 'pct' parameter to True so that it can return the percentile ranks for the column.
Now, apply the 'percentile_rank()' method to the 'Score' column by passing df['Score'] as an argument and then, store the result into a new column called 'Per_Rank'.
In the end, display the result using 'print()' method and exit.
# importing the required package import pandas as pd # defining a sample DataFrame using pandas data = {'Name': ['Ram', 'Shyam', 'Shrey', 'Mohan', 'Navya'], 'Score': [55, 92, 68, 70, 88] } df = pd.DataFrame(data) # user-defined method Calculating the percentile rank def percentile_rank(column): return column.rank(pct = True) # calling the user-defined method df['Per_Rank'] = percentile_rank(df['Score']) # to show the result print(df)
Output
Name Score Per_Rank 0 Ram 55 0.2 1 Shyam 92 1.0 2 Shrey 68 0.4 3 Mohan 70 0.6 4 Navya 88 0.8
Example 3
In this example, we will modify the code from the previous example by defining a new column named 'Balance' and applying the rank() method to it, instead of the 'Score' column.
# importing the required package import pandas as pd # defining a sample DataFrame using pandas data = {'Name': ['Ram', 'Shyam', 'Shrey', 'Mohan', 'Navya'], 'Balance': [5500, 9200, 6800, 7000, 8800]} df = pd.DataFrame(data) # user-defined method Calculating the percentile rank def percentile_rank(column): return column.rank(pct = True) # calling the user-defined method df['Per_Rank'] = percentile_rank(df['Balance']) # to show the result print(df)
Output
Name Balance Per_Rank 0 Ram 5500 0.2 1 Shyam 9200 1.0 2 Shrey 6800 0.4 3 Mohan 7000 0.6 4 Navya 8800 0.8
Conclusion
In this article, we discussed a few approaches to calculate percentile rank including 'rank()' and 'percentile()'. We have used 'rank()' method by specifying pct = True and the percentile() method by passing name of column as an argument.