Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python - How to Group Pandas DataFrame by Month?
Pandas provides powerful tools for grouping data by time periods. To group a DataFrame by month, use pd.Grouper() with frequency parameter 'M' for monthly grouping. This is especially useful for analyzing time-series data like sales records, financial data, or any dataset with datetime columns.
Creating Sample Data
First, let's create a DataFrame with car sales data including purchase dates and registration prices ?
import pandas as pd
# Create DataFrame with car sales data
dataFrame = pd.DataFrame({
"Car": ["Audi", "Lexus", "Tesla", "Mercedes", "BMW", "Toyota", "Nissan", "Bentley", "Mustang"],
"Date_of_Purchase": [
pd.Timestamp("2021-06-10"),
pd.Timestamp("2021-07-11"),
pd.Timestamp("2021-06-25"),
pd.Timestamp("2021-06-29"),
pd.Timestamp("2021-03-20"),
pd.Timestamp("2021-01-22"),
pd.Timestamp("2021-01-06"),
pd.Timestamp("2021-01-04"),
pd.Timestamp("2021-05-09")
],
"Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350]
})
print("Original DataFrame:")
print(dataFrame)
Original DataFrame:
Car Date_of_Purchase Reg_Price
0 Audi 2021-06-10 1000
1 Lexus 2021-07-11 1400
2 Tesla 2021-06-25 1100
3 Mercedes 2021-06-29 900
4 BMW 2021-03-20 1700
5 Toyota 2021-01-22 1800
6 Nissan 2021-01-06 1300
7 Bentley 2021-01-04 1150
8 Mustang 2021-05-09 1350
Grouping by Month Using pd.Grouper()
Use pd.Grouper() with the datetime column and frequency 'M' to group by month. This automatically groups all records from the same month together ?
import pandas as pd
# Create the same DataFrame
dataFrame = pd.DataFrame({
"Car": ["Audi", "Lexus", "Tesla", "Mercedes", "BMW", "Toyota", "Nissan", "Bentley", "Mustang"],
"Date_of_Purchase": [
pd.Timestamp("2021-06-10"),
pd.Timestamp("2021-07-11"),
pd.Timestamp("2021-06-25"),
pd.Timestamp("2021-06-29"),
pd.Timestamp("2021-03-20"),
pd.Timestamp("2021-01-22"),
pd.Timestamp("2021-01-06"),
pd.Timestamp("2021-01-04"),
pd.Timestamp("2021-05-09")
],
"Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350]
})
# Group by month and calculate sum of registration prices
monthly_sales = dataFrame.groupby(pd.Grouper(key='Date_of_Purchase', freq='M')).sum()
print("Monthly Registration Price Totals:")
print(monthly_sales)
Monthly Registration Price Totals:
Reg_Price
Date_of_Purchase
2021-01-31 4250.0
2021-02-28 NaN
2021-03-31 1700.0
2021-04-30 NaN
2021-05-31 1350.0
2021-06-30 3000.0
2021-07-31 1400.0
Alternative Methods
You can also use other aggregation functions like count(), mean(), or max() for different analyses ?
import pandas as pd
# Same DataFrame setup
dataFrame = pd.DataFrame({
"Car": ["Audi", "Lexus", "Tesla", "Mercedes", "BMW", "Toyota", "Nissan", "Bentley", "Mustang"],
"Date_of_Purchase": [
pd.Timestamp("2021-06-10"),
pd.Timestamp("2021-07-11"),
pd.Timestamp("2021-06-25"),
pd.Timestamp("2021-06-29"),
pd.Timestamp("2021-03-20"),
pd.Timestamp("2021-01-22"),
pd.Timestamp("2021-01-06"),
pd.Timestamp("2021-01-04"),
pd.Timestamp("2021-05-09")
],
"Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350]
})
# Count cars sold per month
monthly_count = dataFrame.groupby(pd.Grouper(key='Date_of_Purchase', freq='M')).count()
print("Cars sold per month:")
print(monthly_count['Car'])
Cars sold per month: Date_of_Purchase 2021-01-31 3 2021-02-28 0 2021-03-31 1 2021-04-30 0 2021-05-31 1 2021-06-30 3 2021-07-31 1 Name: Car, dtype: int64
Key Points
-
pd.Grouper()requires a datetime column to group by time periods - Use frequency
'M'for monthly grouping,'D'for daily,'Y'for yearly - Missing months appear with
NaNvalues in the output - Apply aggregation functions like
sum(),count(), ormean()to analyze grouped data
Conclusion
Use pd.Grouper() with frequency parameter 'M' to group DataFrame by month. This approach automatically handles datetime grouping and allows various aggregation functions to analyze monthly patterns in your time-series data.
