Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
Python - Grouping columns in Pandas Dataframe
Pandas DataFrame grouping allows you to split data into groups based on column values and apply aggregate functions. The groupby() method is the primary tool for grouping operations in Pandas.
Creating a DataFrame
Let's start by creating a DataFrame with car data ?
import pandas as pd
# Create dataframe with car information
dataFrame = pd.DataFrame(
{
"Car": ["Audi", "Lexus", "Audi", "Mercedes", "Audi", "Lexus", "Mercedes", "Lexus", "Mercedes"],
"Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350]
}
)
print("DataFrame...")
print(dataFrame)
DataFrame...
Car Reg_Price
0 Audi 1000
1 Lexus 1400
2 Audi 1100
3 Mercedes 900
4 Audi 1700
5 Lexus 1800
6 Mercedes 1300
7 Lexus 1150
8 Mercedes 1350
Grouping by Column
Now let's group the data by the Car column and calculate the mean registration price for each car brand ?
import pandas as pd
# Create dataframe
dataFrame = pd.DataFrame(
{
"Car": ["Audi", "Lexus", "Audi", "Mercedes", "Audi", "Lexus", "Mercedes", "Lexus", "Mercedes"],
"Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350]
}
)
# Group by Car column
grouped_data = dataFrame.groupby("Car")
# Calculate mean registration price for each car brand
mean_prices = grouped_data.mean()
print("Mean Registration Price by Car Brand:")
print(mean_prices)
Mean Registration Price by Car Brand:
Reg_Price
Car
Audi 1266.666667
Lexus 1450.000000
Mercedes 1183.333333
Common GroupBy Operations
You can apply various aggregate functions to grouped data ?
import pandas as pd
# Create dataframe
dataFrame = pd.DataFrame(
{
"Car": ["Audi", "Lexus", "Audi", "Mercedes", "Audi", "Lexus", "Mercedes", "Lexus", "Mercedes"],
"Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350]
}
)
grouped_data = dataFrame.groupby("Car")
print("Count by Car Brand:")
print(grouped_data.count())
print("\nSum by Car Brand:")
print(grouped_data.sum())
print("\nMax Price by Car Brand:")
print(grouped_data.max())
Count by Car Brand:
Reg_Price
Car
Audi 3
Lexus 3
Mercedes 3
Sum by Car Brand:
Reg_Price
Car
Audi 3800
Lexus 4350
Mercedes 3550
Max Price by Car Brand:
Reg_Price
Car
Audi 1700
Lexus 1800
Mercedes 1350
Conclusion
Use groupby() to group DataFrame rows by column values. Apply aggregate functions like mean(), sum(), or count() to analyze grouped data efficiently.
Advertisements
