Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
Group-by and Sum in Python Pandas
The groupby() and sum() functions in Pandas allow you to group data by specific columns and calculate the sum of numeric values for each group. This is particularly useful for data aggregation and analysis.
Basic Group-by and Sum
Here's how to group data by a single column and sum the values ?
import pandas as pd
# Create sample data
df = pd.DataFrame({
"Category": ["A", "B", "A", "B", "A"],
"Sales": [100, 150, 200, 120, 80],
"Profit": [20, 30, 40, 25, 15]
})
print("Input DataFrame:")
print(df)
print("\nGroup by Category and sum:")
result = df.groupby("Category").sum()
print(result)
Input DataFrame:
Category Sales Profit
0 A 100 20
1 B 150 30
2 A 200 40
3 B 120 25
4 A 80 15
Group by Category and sum:
Sales Profit
Category
A 380 75
B 270 55
Group by Multiple Columns
You can group by multiple columns to create more detailed aggregations ?
import pandas as pd
df = pd.DataFrame({
"Region": ["North", "South", "North", "South", "North"],
"Category": ["A", "A", "B", "B", "A"],
"Sales": [100, 150, 200, 120, 80]
})
print("Input DataFrame:")
print(df)
print("\nGroup by Region and Category:")
result = df.groupby(["Region", "Category"]).sum()
print(result)
Input DataFrame:
Region Category Sales
0 North A 100
1 South A 150
2 North B 200
3 South B 120
4 North A 80
Group by Region and Category:
Sales
Region Category
North A 180
B 200
South A 150
B 120
Sum Specific Columns Only
You can select specific columns to sum after grouping ?
import pandas as pd
df = pd.DataFrame({
"Category": ["A", "B", "A", "B"],
"Sales": [100, 150, 200, 120],
"Profit": [20, 30, 40, 25],
"Units": [10, 15, 20, 12]
})
print("Group by Category, sum only Sales:")
result = df.groupby("Category")["Sales"].sum()
print(result)
print("\nGroup by Category, sum Sales and Profit:")
result2 = df.groupby("Category")[["Sales", "Profit"]].sum()
print(result2)
Group by Category, sum only Sales:
Category
A 300
B 270
Name: Sales, dtype: int64
Group by Category, sum Sales and Profit:
Sales Profit
Category
A 300 60
B 270 55
Using reset_index() for Cleaner Output
Convert the grouped result back to a regular DataFrame with reset_index() ?
import pandas as pd
df = pd.DataFrame({
"Category": ["A", "B", "A", "B"],
"Sales": [100, 150, 200, 120],
"Profit": [20, 30, 40, 25]
})
result = df.groupby("Category").sum().reset_index()
print("Result with reset_index():")
print(result)
Result with reset_index(): Category Sales Profit 0 A 300 60 1 B 270 55
Key Points
-
groupby(column).sum()groups data by the specified column and sums numeric values - Multiple columns can be used for grouping:
groupby([col1, col2]) - Specify columns to sum:
groupby(column)[["col1", "col2"]].sum() - Use
reset_index()to convert grouped results back to a regular DataFrame
Conclusion
The groupby().sum() combination is essential for data aggregation in Pandas. Use it to group data by categories and calculate totals for numerical analysis and reporting.
Advertisements
