Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python – Aggregate values by tuple keys
When working with data in Python, you often need to aggregate values by tuple keys ? combining values that share the same tuple identifier. This is useful for grouping data by multiple attributes and performing calculations like sums, averages, or counts.
Using defaultdict() Method
The defaultdict class from the collections module provides an efficient way to aggregate values by automatically handling missing keys ?
from collections import defaultdict
# Sample data: (product, cost) tuples
item_data = [
('Milk', 30),
('Tomato', 100),
('Lentils', 345),
('Milk', 320),
('Tomato', 50)
]
# Create defaultdict to store aggregated values
product_totals = defaultdict(float)
# Aggregate costs by product name
for product, cost in item_data:
product_totals[product] += cost
# Convert to regular dict for cleaner output
result = dict(product_totals)
print("Aggregated costs by product:")
print(result)
Aggregated costs by product:
{'Milk': 350.0, 'Tomato': 150.0, 'Lentils': 345.0}
Using Pandas groupby() Method
Pandas provides powerful grouping capabilities for more complex aggregations with multiple keys ?
import pandas as pd
# Create DataFrame with multiple grouping columns
df = pd.DataFrame({
'product': ['Milk', 'Tomato', 'Lentils', 'Tomato', 'Milk'],
'category': ['Dairy', 'Vegetable', 'Grain', 'Vegetable', 'Dairy'],
'cost': [30, 100, 345, 50, 320]
})
print("Original DataFrame:")
print(df)
# Group by tuple keys (product, category) and sum costs
grouped_result = df.groupby(['product', 'category'])['cost'].sum()
print("\nAggregated by (product, category):")
print(grouped_result)
Original DataFrame: product category cost 0 Milk Dairy 30 1 Tomato Vegetable 100 2 Lentils Grain 345 3 Tomato Vegetable 50 4 Milk Dairy 320 Aggregated by (product, category): product category Lentils Grain 345 Milk Dairy 350 Tomato Vegetable 150 Name: cost, dtype: int64
Using Dictionary with Tuple Keys
For direct tuple key aggregation, you can use regular dictionaries with tuple keys ?
# Sample data with tuple keys: (product, store_location)
sales_data = [
(('Milk', 'Store_A'), 150),
(('Tomato', 'Store_B'), 200),
(('Milk', 'Store_A'), 100),
(('Tomato', 'Store_A'), 75),
(('Milk', 'Store_B'), 80)
]
# Aggregate using dictionary with tuple keys
sales_totals = {}
for (product, store), amount in sales_data:
key = (product, store)
if key in sales_totals:
sales_totals[key] += amount
else:
sales_totals[key] = amount
print("Sales totals by (product, store):")
for key, total in sales_totals.items():
print(f"{key}: {total}")
Sales totals by (product, store):
('Milk', 'Store_A'): 250
('Tomato', 'Store_B'): 200
('Tomato', 'Store_A'): 75
('Milk', 'Store_B'): 80
Comparison
| Method | Best For | Memory Usage | Performance |
|---|---|---|---|
defaultdict |
Simple aggregations | Low | Fast |
pandas groupby |
Complex data analysis | Higher | Good for large datasets |
| Regular dict | Full control over logic | Low | Fast |
Conclusion
Use defaultdict for simple aggregations by tuple keys. Choose pandas groupby for complex data analysis with multiple aggregation functions. Regular dictionaries offer the most control for custom aggregation logic.
