Python – Group and calculate the sum of column values of a Pandas DataFrame

In Pandas, you can group DataFrame rows by specific columns and calculate aggregated values like sum, mean, or count. This is particularly useful for analyzing time-series data where you want to group by periods like months, quarters, or years.

Creating a Sample DataFrame

Let's create a DataFrame with car sales data to demonstrate grouping and summing ?

import pandas as pd

# Create DataFrame with car sales data
dataFrame = pd.DataFrame({
    "Car": ["Audi", "Lexus", "Tesla", "Mercedes", "BMW", "Toyota", "Nissan", "Bentley", "Mustang"],
    "Date_of_Purchase": [
        pd.Timestamp("2021-06-10"),
        pd.Timestamp("2021-07-11"),
        pd.Timestamp("2021-06-25"),
        pd.Timestamp("2021-06-29"),
        pd.Timestamp("2021-03-20"),
        pd.Timestamp("2021-01-22"),
        pd.Timestamp("2021-01-06"),
        pd.Timestamp("2021-01-04"),
        pd.Timestamp("2021-05-09")
    ],
    "Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350]
})

print("DataFrame...")
print(dataFrame)
DataFrame...
        Car Date_of_Purchase  Reg_Price
0      Audi        2021-06-10       1000
1     Lexus        2021-07-11       1400
2     Tesla        2021-06-25       1100
3  Mercedes        2021-06-29        900
4       BMW        2021-03-20       1700
5    Toyota        2021-01-22       1800
6    Nissan        2021-01-06       1300
7   Bentley        2021-01-04       1150
8   Mustang        2021-05-09       1350

Grouping by Month and Calculating Sum

Use pd.Grouper with frequency 'M' to group by month and calculate the sum of registration prices ?

import pandas as pd

# Create DataFrame
dataFrame = pd.DataFrame({
    "Car": ["Audi", "Lexus", "Tesla", "Mercedes", "BMW", "Toyota", "Nissan", "Bentley", "Mustang"],
    "Date_of_Purchase": [
        pd.Timestamp("2021-06-10"),
        pd.Timestamp("2021-07-11"),
        pd.Timestamp("2021-06-25"),
        pd.Timestamp("2021-06-29"),
        pd.Timestamp("2021-03-20"),
        pd.Timestamp("2021-01-22"),
        pd.Timestamp("2021-01-06"),
        pd.Timestamp("2021-01-04"),
        pd.Timestamp("2021-05-09")
    ],
    "Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350]
})

# Group by month and calculate sum
monthly_sum = dataFrame.groupby(pd.Grouper(key='Date_of_Purchase', axis=0, freq='M')).sum()

print("Group DataFrame by month...")
print(monthly_sum)
Group DataFrame by month...
                  Reg_Price
Date_of_Purchase           
2021-01-31           4250.0
2021-02-28              NaN
2021-03-31           1700.0
2021-04-30              NaN
2021-05-31           1350.0
2021-06-30           3000.0
2021-07-31           1400.0

Understanding pd.Grouper Parameters

The pd.Grouper function accepts several key parameters ?

Parameter Description Example Value
key Column name to group by 'Date_of_Purchase'
freq Frequency for time grouping 'M' (month), 'Q' (quarter), 'Y' (year)
axis Axis to group along 0 (rows), 1 (columns)

Removing NaN Values

To exclude months with no data (NaN values), use dropna() ?

import pandas as pd

# Create DataFrame
dataFrame = pd.DataFrame({
    "Car": ["Audi", "Lexus", "Tesla", "Mercedes", "BMW", "Toyota", "Nissan", "Bentley", "Mustang"],
    "Date_of_Purchase": [
        pd.Timestamp("2021-06-10"),
        pd.Timestamp("2021-07-11"),
        pd.Timestamp("2021-06-25"),
        pd.Timestamp("2021-06-29"),
        pd.Timestamp("2021-03-20"),
        pd.Timestamp("2021-01-22"),
        pd.Timestamp("2021-01-06"),
        pd.Timestamp("2021-01-04"),
        pd.Timestamp("2021-05-09")
    ],
    "Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350]
})

# Group by month, sum, and remove NaN values
monthly_sum_clean = dataFrame.groupby(pd.Grouper(key='Date_of_Purchase', freq='M')).sum().dropna()

print("Monthly sum without NaN values...")
print(monthly_sum_clean)
Monthly sum without NaN values...
                  Reg_Price
Date_of_Purchase           
2021-01-31           4250.0
2021-03-31           1700.0
2021-05-31           1350.0
2021-06-30           3000.0
2021-07-31           1400.0

Conclusion

Use pd.Grouper with frequency parameters to group time-series data by periods. The sum() method calculates totals for each group, and dropna() removes empty periods from results.

Updated on: 2026-03-26T02:16:09+05:30

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements