Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
How to count frequency of itemsets in Pandas DataFrame
The value_counts() method in Pandas is used to count the frequency of unique values in a DataFrame column. This is particularly useful for analyzing categorical data and understanding data distribution patterns.
Creating the DataFrame
First, let's create a sample DataFrame with car sales data ?
import pandas as pd
# Create DataFrame
dataFrame = pd.DataFrame({
'Car': ['BMW', 'Mercedes', 'Lamborghini', 'Audi', 'Mercedes', 'Porsche', 'Lamborghini', 'BMW'],
'Place': ['Delhi', 'Hyderabad', 'Chandigarh', 'Bangalore', 'Hyderabad', 'Mumbai', 'Mumbai', 'Pune'],
'UnitsSold': [95, 80, 80, 75, 92, 90, 95, 50]
})
print("DataFrame...")
print(dataFrame)
DataFrame...
Car Place UnitsSold
0 BMW Delhi 95
1 Mercedes Hyderabad 80
2 Lamborghini Chandigarh 80
3 Audi Bangalore 75
4 Mercedes Hyderabad 92
5 Porsche Mumbai 90
6 Lamborghini Mumbai 95
7 BMW Pune 50
Counting Frequency Using value_counts()
Now we can count the frequency of values in each column using the value_counts() method ?
import pandas as pd
dataFrame = pd.DataFrame({
'Car': ['BMW', 'Mercedes', 'Lamborghini', 'Audi', 'Mercedes', 'Porsche', 'Lamborghini', 'BMW'],
'Place': ['Delhi', 'Hyderabad', 'Chandigarh', 'Bangalore', 'Hyderabad', 'Mumbai', 'Mumbai', 'Pune'],
'UnitsSold': [95, 80, 80, 75, 92, 90, 95, 50]
})
# Counting frequency of Car column
car_counts = dataFrame['Car'].value_counts()
print("Count in column Car:")
print(car_counts)
# Counting frequency of Place column
place_counts = dataFrame['Place'].value_counts()
print("\nCount in column Place:")
print(place_counts)
# Counting frequency of UnitsSold column
units_counts = dataFrame['UnitsSold'].value_counts()
print("\nCount in column UnitsSold:")
print(units_counts)
Count in column Car: Car BMW 2 Lamborghini 2 Mercedes 2 Audi 1 Porsche 1 Name: count, dtype: int64 Count in column Place: Place Mumbai 2 Hyderabad 2 Chandigarh 1 Pune 1 Delhi 1 Bangalore 1 Name: count, dtype: int64 Count in column UnitsSold: UnitsSold 95 2 80 2 92 1 75 1 90 1 50 1 Name: count, dtype: int64
Additional Options
The value_counts() method provides several useful parameters ?
import pandas as pd
dataFrame = pd.DataFrame({
'Car': ['BMW', 'Mercedes', 'Lamborghini', 'Audi', 'Mercedes', 'Porsche', 'Lamborghini', 'BMW'],
'Place': ['Delhi', 'Hyderabad', 'Chandigarh', 'Bangalore', 'Hyderabad', 'Mumbai', 'Mumbai', 'Pune']
})
# Get relative frequencies (percentages)
car_percentages = dataFrame['Car'].value_counts(normalize=True)
print("Car frequencies as percentages:")
print(car_percentages)
# Sort in ascending order
car_ascending = dataFrame['Car'].value_counts(ascending=True)
print("\nCar counts in ascending order:")
print(car_ascending)
Car frequencies as percentages: Car BMW 0.25 Lamborghini 0.25 Mercedes 0.25 Audi 0.125 Porsche 0.125 Name: proportion, dtype: float64 Car counts in ascending order: Car Audi 1 Porsche 1 BMW 2 Lamborghini 2 Mercedes 2 Name: count, dtype: int64
Key Parameters
| Parameter | Description | Default |
|---|---|---|
normalize |
Return relative frequencies instead of counts | False |
ascending |
Sort in ascending order | False |
dropna |
Include NaN values in the count | True |
Conclusion
The value_counts() method is essential for frequency analysis in Pandas. Use normalize=True for percentages and ascending=True to sort from lowest to highest frequency.
Advertisements
