Group-by and Sum in Python Pandas

The groupby() and sum() functions in Pandas allow you to group data by specific columns and calculate the sum of numeric values for each group. This is particularly useful for data aggregation and analysis.

Basic Group-by and Sum

Here's how to group data by a single column and sum the values ?

import pandas as pd

# Create sample data
df = pd.DataFrame({
    "Category": ["A", "B", "A", "B", "A"],
    "Sales": [100, 150, 200, 120, 80],
    "Profit": [20, 30, 40, 25, 15]
})

print("Input DataFrame:")
print(df)
print("\nGroup by Category and sum:")
result = df.groupby("Category").sum()
print(result)
Input DataFrame:
  Category  Sales  Profit
0        A    100      20
1        B    150      30
2        A    200      40
3        B    120      25
4        A     80      15

Group by Category and sum:
          Sales  Profit
Category              
A           380      75
B           270      55

Group by Multiple Columns

You can group by multiple columns to create more detailed aggregations ?

import pandas as pd

df = pd.DataFrame({
    "Region": ["North", "South", "North", "South", "North"],
    "Category": ["A", "A", "B", "B", "A"],
    "Sales": [100, 150, 200, 120, 80]
})

print("Input DataFrame:")
print(df)
print("\nGroup by Region and Category:")
result = df.groupby(["Region", "Category"]).sum()
print(result)
Input DataFrame:
  Region Category  Sales
0  North        A    100
1  South        A    150
2  North        B    200
3  South        B    120
4  North        A     80

Group by Region and Category:
                Sales
Region Category      
North  A          180
       B          200
South  A          150
       B          120

Sum Specific Columns Only

You can select specific columns to sum after grouping ?

import pandas as pd

df = pd.DataFrame({
    "Category": ["A", "B", "A", "B"],
    "Sales": [100, 150, 200, 120],
    "Profit": [20, 30, 40, 25],
    "Units": [10, 15, 20, 12]
})

print("Group by Category, sum only Sales:")
result = df.groupby("Category")["Sales"].sum()
print(result)

print("\nGroup by Category, sum Sales and Profit:")
result2 = df.groupby("Category")[["Sales", "Profit"]].sum()
print(result2)
Group by Category, sum only Sales:
Category
A    300
B    270
Name: Sales, dtype: int64

Group by Category, sum Sales and Profit:
          Sales  Profit
Category              
A           300      60
B           270      55

Using reset_index() for Cleaner Output

Convert the grouped result back to a regular DataFrame with reset_index() ?

import pandas as pd

df = pd.DataFrame({
    "Category": ["A", "B", "A", "B"],
    "Sales": [100, 150, 200, 120],
    "Profit": [20, 30, 40, 25]
})

result = df.groupby("Category").sum().reset_index()
print("Result with reset_index():")
print(result)
Result with reset_index():
  Category  Sales  Profit
0        A    300      60
1        B    270      55

Key Points

  • groupby(column).sum() groups data by the specified column and sums numeric values
  • Multiple columns can be used for grouping: groupby([col1, col2])
  • Specify columns to sum: groupby(column)[["col1", "col2"]].sum()
  • Use reset_index() to convert grouped results back to a regular DataFrame

Conclusion

The groupby().sum() combination is essential for data aggregation in Pandas. Use it to group data by categories and calculate totals for numerical analysis and reporting.

Updated on: 2026-03-26T01:54:07+05:30

6K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements