Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
Python - Count distinct in Pandas Aggregation with Numpy
To count distinct values in Pandas aggregation, use nunique() method. This tutorial shows how to group by a column, calculate sums using NumPy, and count distinct values simultaneously.
Required Libraries
Import the necessary libraries for data manipulation ?
import pandas as pd import numpy as np
Creating Sample DataFrame
Create a DataFrame with duplicate values to demonstrate distinct counting ?
import pandas as pd
import numpy as np
dataFrame = pd.DataFrame({
"Car": ['BMW', 'Audi', 'BMW', 'Lexus', 'Lexus'],
"Place": ['Delhi', 'Bangalore', 'Delhi', 'Chandigarh', 'Chandigarh'],
"Units": [100, 150, 50, 110, 90]
})
print("Original DataFrame:")
print(dataFrame)
Original DataFrame:
Car Place Units
0 BMW Delhi 100
1 Audi Bangalore 150
2 BMW Delhi 50
3 Lexus Chandigarh 110
4 Lexus Chandigarh 90
Count Distinct with Aggregation
Use groupby() with agg() to apply multiple aggregation functions. Here we sum Units and count distinct Places ?
import pandas as pd
import numpy as np
dataFrame = pd.DataFrame({
"Car": ['BMW', 'Audi', 'BMW', 'Lexus', 'Lexus'],
"Place": ['Delhi', 'Bangalore', 'Delhi', 'Chandigarh', 'Chandigarh'],
"Units": [100, 150, 50, 110, 90]
})
# Group by Car and apply different aggregations
result = dataFrame.groupby("Car").agg({
"Units": np.sum, # Sum of units
"Place": pd.Series.nunique # Count distinct places
})
print("Aggregated DataFrame:")
print(result)
Aggregated DataFrame:
Units Place
Car
Audi 150 1
BMW 150 1
Lexus 200 1
Alternative Approach Using nunique()
You can also use the direct nunique() method for cleaner syntax ?
import pandas as pd
import numpy as np
dataFrame = pd.DataFrame({
"Car": ['BMW', 'Audi', 'BMW', 'Lexus', 'Toyota', 'Toyota'],
"Place": ['Delhi', 'Bangalore', 'Mumbai', 'Chandigarh', 'Delhi', 'Mumbai'],
"Units": [100, 150, 50, 110, 90, 80]
})
# Alternative syntax using nunique directly
result = dataFrame.groupby("Car").agg({
"Units": "sum",
"Place": "nunique"
})
print("Result with multiple distinct places:")
print(result)
Result with multiple distinct places:
Units Place
Car
Audi 150 1
BMW 150 2
Lexus 110 1
Toyota 170 2
Key Points
| Method | Syntax | Use Case |
|---|---|---|
pd.Series.nunique |
{"col": pd.Series.nunique} |
Explicit method reference |
"nunique" |
{"col": "nunique"} |
String shorthand (cleaner) |
np.sum |
{"col": np.sum} |
NumPy sum function |
Conclusion
Use nunique() in Pandas aggregation to count distinct values efficiently. Combine it with other functions like np.sum() to perform multiple aggregations in a single operation.
Advertisements
