Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
Python - How to Group Pandas DataFrame by Year?
We can group a Pandas DataFrame by year using groupby() with pd.Grouper(). This method allows us to specify a date column and frequency for grouping time-based data.
Creating a DataFrame with Date Column
Let's create a sample DataFrame with car purchase records ?
import pandas as pd
# DataFrame with Date_of_Purchase column
dataFrame = pd.DataFrame(
{
"Car": ["Audi", "Lexus", "Tesla", "Mercedes", "BMW", "Toyota", "Nissan", "Bentley", "Mustang"],
"Date_of_Purchase": [pd.Timestamp("2021-06-10"),
pd.Timestamp("2019-07-11"),
pd.Timestamp("2016-06-25"),
pd.Timestamp("2021-06-29"),
pd.Timestamp("2020-03-20"),
pd.Timestamp("2019-01-22"),
pd.Timestamp("2011-01-06"),
pd.Timestamp("2013-01-04"),
pd.Timestamp("2014-05-09")
],
"Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350]
}
)
print("DataFrame...")
print(dataFrame)
DataFrame...
Car Date_of_Purchase Reg_Price
0 Audi 2021-06-10 1000
1 Lexus 2019-07-11 1400
2 Tesla 2016-06-25 1100
3 Mercedes 2021-06-29 900
4 BMW 2020-03-20 1700
5 Toyota 2019-01-22 1800
6 Nissan 2011-01-06 1300
7 Bentley 2013-01-04 1150
8 Mustang 2014-05-09 1350
Grouping by Year Using pd.Grouper()
Use pd.Grouper() to group by the date column with annual frequency ?
import pandas as pd
dataFrame = pd.DataFrame(
{
"Car": ["Audi", "Lexus", "Tesla", "Mercedes", "BMW", "Toyota", "Nissan", "Bentley", "Mustang"],
"Date_of_Purchase": [pd.Timestamp("2021-06-10"),
pd.Timestamp("2019-07-11"),
pd.Timestamp("2016-06-25"),
pd.Timestamp("2021-06-29"),
pd.Timestamp("2020-03-20"),
pd.Timestamp("2019-01-22"),
pd.Timestamp("2011-01-06"),
pd.Timestamp("2013-01-04"),
pd.Timestamp("2014-05-09")
],
"Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350]
}
)
# Group by year and calculate sum
yearly_grouped = dataFrame.groupby(pd.Grouper(key='Date_of_Purchase', freq='Y')).sum()
print("Grouped by Year (Annual)...")
print(yearly_grouped)
Grouped by Year (Annual)...
Reg_Price
Date_of_Purchase
2011-12-31 1300
2013-12-31 1150
2014-12-31 1350
2016-12-31 1100
2019-12-31 3200
2020-12-31 1700
2021-12-31 1900
Grouping by Multi-Year Intervals
You can also group by multiple-year intervals using frequency like '3Y' ?
import pandas as pd
dataFrame = pd.DataFrame(
{
"Car": ["Audi", "Lexus", "Tesla", "Mercedes", "BMW", "Toyota", "Nissan", "Bentley", "Mustang"],
"Date_of_Purchase": [pd.Timestamp("2021-06-10"),
pd.Timestamp("2019-07-11"),
pd.Timestamp("2016-06-25"),
pd.Timestamp("2021-06-29"),
pd.Timestamp("2020-03-20"),
pd.Timestamp("2019-01-22"),
pd.Timestamp("2011-01-06"),
pd.Timestamp("2013-01-04"),
pd.Timestamp("2014-05-09")
],
"Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350]
}
)
# Group by 3-year intervals
three_year_grouped = dataFrame.groupby(pd.Grouper(key='Date_of_Purchase', freq='3Y')).sum()
print("Grouped by 3-Year Intervals...")
print(three_year_grouped)
Grouped by 3-Year Intervals...
Reg_Price
Date_of_Purchase
2011-12-31 1300
2014-12-31 2500
2017-12-31 1100
2020-12-31 4900
2023-12-31 1900
Alternative Methods
You can also extract the year and group directly ?
import pandas as pd
dataFrame = pd.DataFrame(
{
"Car": ["Audi", "Lexus", "Tesla", "Mercedes", "BMW"],
"Date_of_Purchase": [pd.Timestamp("2021-06-10"),
pd.Timestamp("2019-07-11"),
pd.Timestamp("2016-06-25"),
pd.Timestamp("2021-06-29"),
pd.Timestamp("2020-03-20")],
"Reg_Price": [1000, 1400, 1100, 900, 1700]
}
)
# Extract year and group
dataFrame['Year'] = dataFrame['Date_of_Purchase'].dt.year
year_grouped = dataFrame.groupby('Year').sum()
print("Grouped by Extracted Year...")
print(year_grouped)
Grouped by Extracted Year...
Reg_Price
Year
2016 1100
2019 1400
2020 1700
2021 1900
Conclusion
Use pd.Grouper() with frequency parameters like 'Y' for yearly grouping or '3Y' for multi-year intervals. Alternatively, extract the year using .dt.year and group directly for simpler year-based grouping.
Advertisements
