Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Absolute and Relative frequency in Pandas
In statistics, frequency indicates how many times a value appears in a dataset. Absolute frequency is the raw count, while relative frequency is the proportion (count divided by total observations). Pandas provides built-in methods for calculating both.
Absolute Frequency
Using value_counts()
The simplest way to count occurrences of each value ?
import pandas as pd data = ["Chandigarh", "Hyderabad", "Pune", "Pune", "Chandigarh", "Pune"] df = pd.Series(data).value_counts() print(df)
Pune 3 Chandigarh 2 Hyderabad 1 dtype: int64
Using crosstab()
An alternative that produces a tabular format ?
import pandas as pd data = ["Chandigarh", "Hyderabad", "Pune", "Pune", "Chandigarh", "Pune"] df = pd.DataFrame(data, columns=["City"]) tab_result = pd.crosstab(index=df["City"], columns=["count"]) print(tab_result)
col_0 count City Chandigarh 2 Hyderabad 1 Pune 3
Relative Frequency
Relative frequency is the ratio of each value's count to the total observations. It can be expressed as a decimal or percentage ?
$$\mathrm{Relative\:Frequency = \frac{Absolute\:Frequency}{Total\:Observations}}$$
Using value_counts(normalize=True)
import pandas as pd data = ["Chandigarh", "Hyderabad", "Pune", "Pune", "Chandigarh", "Pune"] # Method 1: normalize parameter print(pd.Series(data).value_counts(normalize=True))
Pune 0.500000 Chandigarh 0.333333 Hyderabad 0.166667 dtype: float64
Manual Calculation
import pandas as pd data = ["Chandigarh", "Hyderabad", "Pune", "Pune", "Chandigarh", "Pune"] # Method 2: divide by total count freq = pd.Series(data).value_counts() relative = freq / len(data) print(relative)
Pune 0.500000 Chandigarh 0.333333 Hyderabad 0.166667 dtype: float64
Pune appears 3 out of 6 times = 0.50 (50%), Chandigarh 2/6 = 0.33 (33%), Hyderabad 1/6 = 0.17 (17%).
Comparison
| City | Absolute Frequency | Relative Frequency | Percentage |
|---|---|---|---|
| Pune | 3 | 0.50 | 50% |
| Chandigarh | 2 | 0.33 | 33% |
| Hyderabad | 1 | 0.17 | 17% |
Conclusion
Use value_counts() for absolute frequency and value_counts(normalize=True) for relative frequency directly. The crosstab() method is useful when you need tabular output or cross-tabulation between multiple columns. Relative frequency is preferred when comparing datasets of different sizes.
