Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
Python - How to group DataFrame rows into list in Pandas?
When working with Pandas DataFrames, you may need to group rows and collect values into lists. This is commonly done using the groupby() method combined with apply(list).
Basic Grouping with apply(list)
The simplest approach uses groupby() with apply(list) to collect values ?
import pandas as pd
# Create DataFrame
dataFrame = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Audi', 'Mustang', 'Bentley', 'Jaguar'],
"Units": [100, 150, 110, 80, 110, 90]
})
print("Original DataFrame:")
print(dataFrame)
# Group by Car and collect Units into lists
grouped = dataFrame.groupby('Car')['Units'].apply(list)
print("\nGrouped DataFrame:")
print(grouped)
Original DataFrame:
Car Units
0 BMW 100
1 Lexus 150
2 Audi 110
3 Mustang 80
4 Bentley 110
5 Jaguar 90
Grouped DataFrame:
Car
Audi [110]
BMW [100]
Bentley [110]
Jaguar [90]
Lexus [150]
Mustang [80]
Name: Units, dtype: object
Grouping Multiple Columns
You can group multiple columns into lists by selecting them before applying list ?
import pandas as pd
# Create DataFrame with duplicate car entries
dataFrame = pd.DataFrame({
"Car": ['BMW', 'BMW', 'Audi', 'Audi', 'Lexus'],
"Units": [100, 120, 110, 95, 150],
"Price": [50000, 55000, 45000, 42000, 60000]
})
print("Original DataFrame:")
print(dataFrame)
# Group by Car and collect both Units and Price into lists
grouped = dataFrame.groupby('Car')[['Units', 'Price']].apply(lambda x: x.values.tolist())
print("\nGrouped with multiple columns:")
print(grouped)
Original DataFrame:
Car Units Price
0 BMW 100 50000
1 BMW 120 55000
2 Audi 110 45000
3 Audi 95 42000
4 Lexus 150 60000
Grouped with multiple columns:
Car
Audi [[110, 45000], [95, 42000]]
BMW [[100, 50000], [120, 55000]]
Lexus [[150, 60000]]
Name: (Units, Price), dtype: object
Using agg() for Multiple Aggregations
The agg() method provides more flexibility for different aggregation functions ?
import pandas as pd
dataFrame = pd.DataFrame({
"Category": ['A', 'A', 'B', 'B', 'A'],
"Values": [10, 20, 30, 40, 50]
})
print("Original DataFrame:")
print(dataFrame)
# Use agg() to create lists and calculate other aggregations
result = dataFrame.groupby('Category')['Values'].agg(['list', 'sum', 'mean'])
print("\nMultiple aggregations:")
print(result)
Original DataFrame:
Category Values
0 A 10
1 A 20
2 B 30
3 B 40
4 A 50
Multiple aggregations:
list sum mean
Category
A [10, 20, 50] 80 26.666667
B [30, 40] 70 35.000000
Comparison
| Method | Use Case | Output Type |
|---|---|---|
apply(list) |
Single column grouping | Series with lists |
agg() |
Multiple aggregations | DataFrame |
lambda with tolist() |
Multiple columns | Series with nested lists |
Conclusion
Use groupby() with apply(list) for simple grouping into lists. For multiple columns or aggregations, consider using agg() or lambda functions for more flexibility.
Advertisements
