Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to Count Unique Values in a Pandas Groupby Object?
In data analysis, counting unique values in a Pandas GroupBy object helps understand data diversity and distribution within groups. This is essential for analyzing categorical data patterns and identifying group characteristics.
Pandas provides several methods to count unique values in grouped data: nunique(), agg(), and combining unique() with len(). Each approach has specific use cases depending on your analysis requirements.
Using the nunique() Method
The nunique() method is the most direct way to count unique values in each group. It returns the number of distinct values for specified columns within each group.
Example
import pandas as pd
# Create sample data
data = {
'Category': ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'C', 'C'],
'Product': ['X1', 'X2', 'X1', 'Y1', 'Y1', 'Z1', 'Z2', 'Z3', 'Z1'],
'Sales': [100, 200, 150, 300, 250, 400, 350, 500, 450]
}
df = pd.DataFrame(data)
# Count unique products per category
unique_count = df.groupby('Category')['Product'].nunique()
print(unique_count)
Category A 2 B 1 C 3 Name: Product, dtype: int64
Using the agg() Method
The agg() method allows applying multiple aggregation functions simultaneously, including nunique(). This is useful when you need various statistics for grouped data.
Example
import pandas as pd
# Create sample data with multiple grouping columns
data = {
'Region': ['North', 'North', 'South', 'South', 'North', 'South'],
'Category': ['A', 'A', 'B', 'B', 'B', 'A'],
'Product': ['X1', 'X2', 'Y1', 'Y2', 'Y1', 'X1'],
'Sales': [100, 200, 300, 400, 250, 150]
}
df = pd.DataFrame(data)
# Group by multiple columns and count unique products
result = df.groupby(['Region', 'Category']).agg({'Product': 'nunique', 'Sales': 'sum'})
print(result)
Product Sales
Region Category
North A 2 300
B 1 250
South A 1 150
B 2 700
Using unique() Method with len()
This approach first extracts unique values using unique(), then counts them with len(). It's useful when you also need to see the actual unique values.
Example
import pandas as pd
# Create sample data
data = {
'Department': ['IT', 'IT', 'HR', 'HR', 'IT', 'Finance'],
'Employee': ['John', 'Alice', 'Bob', 'Carol', 'David', 'Eve'],
'Salary': [50000, 60000, 45000, 48000, 55000, 52000]
}
df = pd.DataFrame(data)
# Get unique employees per department
unique_employees = df.groupby('Department')['Employee'].unique()
print("Unique employees:")
print(unique_employees)
# Count unique employees per department
unique_count = unique_employees.apply(len)
print("\nCount of unique employees:")
print(unique_count)
Unique employees: Department Finance [Eve] HR [Bob, Carol] IT [John, Alice, David] Name: Employee, dtype: object Count of unique employees: Department Finance 1 HR 2 IT 3 Name: Employee, dtype: int64
Comparison
| Method | Best For | Output Type |
|---|---|---|
nunique() |
Simple unique counting | Series with counts |
agg() |
Multiple aggregations | DataFrame with statistics |
unique() + len() |
When you need actual unique values | Series with counts and values |
Conclusion
Use nunique() for straightforward unique value counting in GroupBy objects. Choose agg() when combining multiple aggregation functions, and use unique() with len() when you need to examine the actual unique values alongside their counts.
