Python - How to Group Pandas DataFrame by Year?

We can group a Pandas DataFrame by year using groupby() with pd.Grouper(). This method allows us to specify a date column and frequency for grouping time-based data.

Creating a DataFrame with Date Column

Let's create a sample DataFrame with car purchase records ?

import pandas as pd

# DataFrame with Date_of_Purchase column
dataFrame = pd.DataFrame(
    {
        "Car": ["Audi", "Lexus", "Tesla", "Mercedes", "BMW", "Toyota", "Nissan", "Bentley", "Mustang"],
        "Date_of_Purchase": [pd.Timestamp("2021-06-10"),
                           pd.Timestamp("2019-07-11"),
                           pd.Timestamp("2016-06-25"),
                           pd.Timestamp("2021-06-29"),
                           pd.Timestamp("2020-03-20"),
                           pd.Timestamp("2019-01-22"),
                           pd.Timestamp("2011-01-06"),
                           pd.Timestamp("2013-01-04"),
                           pd.Timestamp("2014-05-09")
        ],
        "Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350]
    }
)

print("DataFrame...")
print(dataFrame)
DataFrame...
        Car Date_of_Purchase  Reg_Price
0      Audi        2021-06-10       1000
1     Lexus        2019-07-11       1400
2     Tesla        2016-06-25       1100
3  Mercedes        2021-06-29        900
4       BMW        2020-03-20       1700
5    Toyota        2019-01-22       1800
6    Nissan        2011-01-06       1300
7   Bentley        2013-01-04       1150
8   Mustang        2014-05-09       1350

Grouping by Year Using pd.Grouper()

Use pd.Grouper() to group by the date column with annual frequency ?

import pandas as pd

dataFrame = pd.DataFrame(
    {
        "Car": ["Audi", "Lexus", "Tesla", "Mercedes", "BMW", "Toyota", "Nissan", "Bentley", "Mustang"],
        "Date_of_Purchase": [pd.Timestamp("2021-06-10"),
                           pd.Timestamp("2019-07-11"),
                           pd.Timestamp("2016-06-25"),
                           pd.Timestamp("2021-06-29"),
                           pd.Timestamp("2020-03-20"),
                           pd.Timestamp("2019-01-22"),
                           pd.Timestamp("2011-01-06"),
                           pd.Timestamp("2013-01-04"),
                           pd.Timestamp("2014-05-09")
        ],
        "Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350]
    }
)

# Group by year and calculate sum
yearly_grouped = dataFrame.groupby(pd.Grouper(key='Date_of_Purchase', freq='Y')).sum()
print("Grouped by Year (Annual)...")
print(yearly_grouped)
Grouped by Year (Annual)...
                Reg_Price
Date_of_Purchase         
2011-12-31           1300
2013-12-31           1150
2014-12-31           1350
2016-12-31           1100
2019-12-31           3200
2020-12-31           1700
2021-12-31           1900

Grouping by Multi-Year Intervals

You can also group by multiple-year intervals using frequency like '3Y' ?

import pandas as pd

dataFrame = pd.DataFrame(
    {
        "Car": ["Audi", "Lexus", "Tesla", "Mercedes", "BMW", "Toyota", "Nissan", "Bentley", "Mustang"],
        "Date_of_Purchase": [pd.Timestamp("2021-06-10"),
                           pd.Timestamp("2019-07-11"),
                           pd.Timestamp("2016-06-25"),
                           pd.Timestamp("2021-06-29"),
                           pd.Timestamp("2020-03-20"),
                           pd.Timestamp("2019-01-22"),
                           pd.Timestamp("2011-01-06"),
                           pd.Timestamp("2013-01-04"),
                           pd.Timestamp("2014-05-09")
        ],
        "Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350]
    }
)

# Group by 3-year intervals
three_year_grouped = dataFrame.groupby(pd.Grouper(key='Date_of_Purchase', freq='3Y')).sum()
print("Grouped by 3-Year Intervals...")
print(three_year_grouped)
Grouped by 3-Year Intervals...
                Reg_Price
Date_of_Purchase         
2011-12-31           1300
2014-12-31           2500
2017-12-31           1100
2020-12-31           4900
2023-12-31           1900

Alternative Methods

You can also extract the year and group directly ?

import pandas as pd

dataFrame = pd.DataFrame(
    {
        "Car": ["Audi", "Lexus", "Tesla", "Mercedes", "BMW"],
        "Date_of_Purchase": [pd.Timestamp("2021-06-10"),
                           pd.Timestamp("2019-07-11"),
                           pd.Timestamp("2016-06-25"),
                           pd.Timestamp("2021-06-29"),
                           pd.Timestamp("2020-03-20")],
        "Reg_Price": [1000, 1400, 1100, 900, 1700]
    }
)

# Extract year and group
dataFrame['Year'] = dataFrame['Date_of_Purchase'].dt.year
year_grouped = dataFrame.groupby('Year').sum()
print("Grouped by Extracted Year...")
print(year_grouped)
Grouped by Extracted Year...
      Reg_Price
Year           
2016       1100
2019       1400
2020       1700
2021       1900

Conclusion

Use pd.Grouper() with frequency parameters like 'Y' for yearly grouping or '3Y' for multi-year intervals. Alternatively, extract the year using .dt.year and group directly for simpler year-based grouping.

Updated on: 2026-03-26T13:38:46+05:30

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements