Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
Write a program in Python to count the records based on the designation in a given DataFrame
To count records based on designation in a pandas DataFrame, we use the groupby() method combined with count(). This groups rows by designation and counts occurrences in each group.
Creating the DataFrame
Let's start by creating a sample DataFrame with employee data ?
import pandas as pd
data = {
'Id': [1, 2, 3, 4, 5],
'Designation': ['architect', 'scientist', 'programmer', 'scientist', 'programmer']
}
df = pd.DataFrame(data)
print("DataFrame is:")
print(df)
DataFrame is: Id Designation 0 1 architect 1 2 scientist 2 3 programmer 3 4 scientist 4 5 programmer
Counting Records by Designation
Use groupby() to group records by designation, then count() to get the total count for each group ?
import pandas as pd
data = {
'Id': [1, 2, 3, 4, 5],
'Designation': ['architect', 'scientist', 'programmer', 'scientist', 'programmer']
}
df = pd.DataFrame(data)
print("Count of records by designation:")
result = df.groupby(['Designation']).count()
print(result)
Count of records by designation:
Id
Designation
architect 1
programmer 2
scientist 2
Alternative Approaches
You can also use size() instead of count() for counting ?
import pandas as pd
data = {
'Id': [1, 2, 3, 4, 5],
'Designation': ['architect', 'scientist', 'programmer', 'scientist', 'programmer']
}
df = pd.DataFrame(data)
print("Using size():")
result = df.groupby('Designation').size()
print(result)
Using size(): Designation architect 1 programmer 2 scientist 2 dtype: int64
Difference Between count() and size()
| Method | Excludes NaN | Returns |
|---|---|---|
count() |
Yes | DataFrame with counts for each column |
size() |
No | Series with total group sizes |
Conclusion
Use groupby(['Designation']).count() to count non-null records by designation. Use size() if you want to include NaN values and get a simpler Series output.
Advertisements
