Article Categories

Selected Reading

Python - Grouping columns in Pandas Dataframe

Python Server Side Programming Programming

Pandas DataFrame grouping allows you to split data into groups based on column values and apply aggregate functions. The groupby() method is the primary tool for grouping operations in Pandas.

Creating a DataFrame

Let's start by creating a DataFrame with car data ?

import pandas as pd

# Create dataframe with car information
dataFrame = pd.DataFrame(
    {
        "Car": ["Audi", "Lexus", "Audi", "Mercedes", "Audi", "Lexus", "Mercedes", "Lexus", "Mercedes"],
        "Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350]
    }
)

print("DataFrame...")
print(dataFrame)

DataFrame...
       Car  Reg_Price
0     Audi       1000
1    Lexus       1400
2     Audi       1100
3 Mercedes        900
4     Audi       1700
5    Lexus       1800
6 Mercedes       1300
7    Lexus       1150
8 Mercedes       1350

Grouping by Column

Now let's group the data by the Car column and calculate the mean registration price for each car brand ?

import pandas as pd

# Create dataframe
dataFrame = pd.DataFrame(
    {
        "Car": ["Audi", "Lexus", "Audi", "Mercedes", "Audi", "Lexus", "Mercedes", "Lexus", "Mercedes"],
        "Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350]
    }
)

# Group by Car column
grouped_data = dataFrame.groupby("Car")

# Calculate mean registration price for each car brand
mean_prices = grouped_data.mean()

print("Mean Registration Price by Car Brand:")
print(mean_prices)

Mean Registration Price by Car Brand:
          Reg_Price
Car                
Audi     1266.666667
Lexus    1450.000000
Mercedes 1183.333333

Common GroupBy Operations

You can apply various aggregate functions to grouped data ?

import pandas as pd

# Create dataframe
dataFrame = pd.DataFrame(
    {
        "Car": ["Audi", "Lexus", "Audi", "Mercedes", "Audi", "Lexus", "Mercedes", "Lexus", "Mercedes"],
        "Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350]
    }
)

grouped_data = dataFrame.groupby("Car")

print("Count by Car Brand:")
print(grouped_data.count())
print("\nSum by Car Brand:")
print(grouped_data.sum())
print("\nMax Price by Car Brand:")
print(grouped_data.max())

Count by Car Brand:
          Reg_Price
Car                
Audi              3
Lexus             3
Mercedes          3

Sum by Car Brand:
          Reg_Price
Car                
Audi           3800
Lexus          4350
Mercedes       3550

Max Price by Car Brand:
          Reg_Price
Car                
Audi           1700
Lexus          1800
Mercedes       1350

Conclusion

Use groupby() to group DataFrame rows by column values. Apply aggregate functions like mean(), sum(), or count() to analyze grouped data efficiently.

AmitDiwan

Updated on: 2026-03-26T01:31:19+05:30

614 Views

Previous Next