Python – Aggregate values by tuple keys

When working with data in Python, you often need to aggregate values by tuple keys ? combining values that share the same tuple identifier. This is useful for grouping data by multiple attributes and performing calculations like sums, averages, or counts.

Using defaultdict() Method

The defaultdict class from the collections module provides an efficient way to aggregate values by automatically handling missing keys ?

from collections import defaultdict

# Sample data: (product, cost) tuples
item_data = [
    ('Milk', 30),
    ('Tomato', 100),
    ('Lentils', 345),
    ('Milk', 320),
    ('Tomato', 50)
]

# Create defaultdict to store aggregated values
product_totals = defaultdict(float)

# Aggregate costs by product name
for product, cost in item_data:
    product_totals[product] += cost

# Convert to regular dict for cleaner output
result = dict(product_totals)
print("Aggregated costs by product:")
print(result)
Aggregated costs by product:
{'Milk': 350.0, 'Tomato': 150.0, 'Lentils': 345.0}

Using Pandas groupby() Method

Pandas provides powerful grouping capabilities for more complex aggregations with multiple keys ?

import pandas as pd

# Create DataFrame with multiple grouping columns
df = pd.DataFrame({
    'product': ['Milk', 'Tomato', 'Lentils', 'Tomato', 'Milk'],
    'category': ['Dairy', 'Vegetable', 'Grain', 'Vegetable', 'Dairy'],
    'cost': [30, 100, 345, 50, 320]
})

print("Original DataFrame:")
print(df)

# Group by tuple keys (product, category) and sum costs
grouped_result = df.groupby(['product', 'category'])['cost'].sum()

print("\nAggregated by (product, category):")
print(grouped_result)
Original DataFrame:
   product   category  cost
0     Milk      Dairy    30
1   Tomato  Vegetable   100
2  Lentils      Grain   345
3   Tomato  Vegetable    50
4     Milk      Dairy   320

Aggregated by (product, category):
product  category 
Lentils  Grain        345
Milk     Dairy        350
Tomato   Vegetable    150
Name: cost, dtype: int64

Using Dictionary with Tuple Keys

For direct tuple key aggregation, you can use regular dictionaries with tuple keys ?

# Sample data with tuple keys: (product, store_location)
sales_data = [
    (('Milk', 'Store_A'), 150),
    (('Tomato', 'Store_B'), 200),
    (('Milk', 'Store_A'), 100),
    (('Tomato', 'Store_A'), 75),
    (('Milk', 'Store_B'), 80)
]

# Aggregate using dictionary with tuple keys
sales_totals = {}

for (product, store), amount in sales_data:
    key = (product, store)
    if key in sales_totals:
        sales_totals[key] += amount
    else:
        sales_totals[key] = amount

print("Sales totals by (product, store):")
for key, total in sales_totals.items():
    print(f"{key}: {total}")
Sales totals by (product, store):
('Milk', 'Store_A'): 250
('Tomato', 'Store_B'): 200
('Tomato', 'Store_A'): 75
('Milk', 'Store_B'): 80

Comparison

Method Best For Memory Usage Performance
defaultdict Simple aggregations Low Fast
pandas groupby Complex data analysis Higher Good for large datasets
Regular dict Full control over logic Low Fast

Conclusion

Use defaultdict for simple aggregations by tuple keys. Choose pandas groupby for complex data analysis with multiple aggregation functions. Regular dictionaries offer the most control for custom aggregation logic.

Updated on: 2026-03-27T13:39:31+05:30

437 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements