Python – Group and calculate the sum of column values of a Pandas DataFrame


We will consider an example of Car Sale Records and group month-wise to calculate the sum of Registration Price of car monthly. To sum, we use the sum() method.

At first, let’s say the following is our Pandas DataFrame with three columns −

dataFrame = pd.DataFrame(
   {
      "Car": ["Audi", "Lexus", "Tesla", "Mercedes", "BMW", "Toyota", "Nissan", "Bentley", "Mustang"],

      "Date_of_Purchase": [
         pd.Timestamp("2021-06-10"),
         pd.Timestamp("2021-07-11"),
         pd.Timestamp("2021-06-25"),
         pd.Timestamp("2021-06-29"),
         pd.Timestamp("2021-03-20"),
         pd.Timestamp("2021-01-22"),
         pd.Timestamp("2021-01-06"),
         pd.Timestamp("2021-01-04"),
         pd.Timestamp("2021-05-09")
      ],

      "Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350]
   }
)

Use the Grouper to select Date_of_Purchase column within groupby() function. The frequency freq is set ‘M’ to group by month-wise and sum is calculates using the sum() function −

print"\nGroup Dataframe by month...\n",dataFrame.groupby(pd.Grouper(key='Date_of_Purchase', axis=0, freq='M')).sum()

Example

Following is the code −

import pandas as pd

# dataframe with one of the columns as Date_of_Purchase
dataFrame = pd.DataFrame(
   {
      "Car": ["Audi", "Lexus", "Tesla", "Mercedes", "BMW", "Toyota", "Nissan", "Bentley", "Mustang"],

      "Date_of_Purchase": [
         pd.Timestamp("2021-06-10"),
         pd.Timestamp("2021-07-11"),
         pd.Timestamp("2021-06-25"),
         pd.Timestamp("2021-06-29"),
         pd.Timestamp("2021-03-20"),
         pd.Timestamp("2021-01-22"),
         pd.Timestamp("2021-01-06"),
         pd.Timestamp("2021-01-04"),
         pd.Timestamp("2021-05-09")
      ],

      "Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350]
   }
)

print"DataFrame...\n",dataFrame

# Grouper to select Date_of_Purchase column within groupby function
# calculation the sum month-wise
print"\nGroup Dataframe by month...\n",dataFrame.groupby(pd.Grouper(key='Date_of_Purchase', axis=0, freq='M')).sum()

Output

This will produce the following output −

DataFrame...
        Car   Date_of_Purchase   Reg_Price
0      Audi        2021-06-10        1000
1     Lexus        2021-07-11        1400
2     Tesla        2021-06-25        1100
3  Mercedes        2021-06-29         900
4       BMW        2021-03-20        1700
5    Toyota        2021-01-22        1800
6    Nissan        2021-01-06        1300
7   Bentley        2021-01-04        1150
8   Mustang        2021-05-09        1350

Group Dataframe by month...
                   Reg_Price
Date_of_Purchase
2021-01-31           4250.0
2021-02-28              NaN
2021-03-31           1700.0
2021-04-30              NaN
2021-05-31           1350.0
2021-06-30           3000.0
2021-07-31           1400.0

Updated on: 16-Sep-2021

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements