How to Group Data by Time Intervals in Python Pandas?


Data analysis has increasingly become a crucial aspect of every industry. Numerous organizations depend intensely on information, make strategic decisions, forecast trends, and understand their consumer behaviors. In such a climate, Python's Pandas library has arisen as a powerhouse device, offering a different scope of functionalities to control, break down, and imagine information successfully. One of these powerful capabilities includes grouping data by time intervals.

This article will focus on how to group data by time intervals using Pandas. We will be exploring the syntax, an easy-to-understand algorithm, two distinct approaches, and two fully executable real codes based on these approaches.

Syntax

The method we'll focus on is Pandas' groupby() function, particularly its resampling method. The syntax is as follows:

df.groupby(pd.Grouper(key='date', freq='T')).sum()

In the syntax:

  • df − Your DataFrame.

  • groupby(pd.Grouper()) − The function to group data.

  • key − The column you want to group by. Here, it's the 'date' column.

  • freq − The frequency of the time intervals. ('T' for minutes, 'H' for hours, 'D' for days, etc.)

  • sum() − The aggregation function.

Algorithm

Here's the step-by-step algorithm for grouping data by time intervals −

  • Import the necessary libraries, i.e., Pandas.

  • Load or create your DataFrame.

  • Convert the date column to a datetime object, if it isn't already.

  • Apply the groupby() function with pd.Grouper on the date column with the desired frequency.

  • Apply the aggregation function like sum(), mean(), etc.

  • Print or store the result.

Approaches

We'll consider two distinct approaches −

Approach 1: Grouping by Daily Frequency

In this example, we created a DataFrame with a range of dates and values. We then grouped the data by daily frequency and summed the values for each day.

Example

# Import pandas
import pandas as pd

# Create a dataframe
df = pd.DataFrame({
   'date': pd.date_range(start='1/1/2022', periods=100, freq='H'),
   'value': range(100)
})

# Convert 'date' to datetime object, if not already
df['date'] = pd.to_datetime(df['date'])

# Group by daily frequency
daily_df = df.groupby(pd.Grouper(key='date', freq='D')).sum()

print(daily_df)

Output

            value
date             
2022-01-01    276
2022-01-02    852
2022-01-03   1428
2022-01-04   2004
2022-01-05    390

Explanation

Bringing in the Pandas library, which is an absolute requirement for any data manipulation work, is the principal thing we truly do in this code. Utilizing the pd.DataFrame() strategy is the subsequent stage during the time spent building a DataFrame. The 'date' and 'value' sections make up this DataFrame. The pd.date_range() function is utilized to create a progression of hourly timestamps in the 'date' column, while the 'value' section just incorporates a scope of whole numbers. The 'date' column is the consequence of this interaction.

Notwithstanding the way that our 'date' column as of now addresses a datetime object, we by and by utilize the pd.to_datetime() function to ensure that it gets changed over. This step is critical since the progress of the gathering activity is dependent upon this segment having the information kind of a datetime object.

After that, to group our data by a daily ('D') frequency, we utilize the groupby() function in conjunction with the pd.Grouper() function. Following the application of the grouping, we put in the sum() function, which brings together all of the 'value' elements that belong to the same day into a single total.

At long last, the grouped DataFrame is written out, displaying the totals for each day's values.

Approach 2: Grouping by a custom frequency, such as 15-minute intervals

Example

# Import pandas
import pandas as pd

# Create a dataframe
df = pd.DataFrame({
   'date': pd.date_range(start='1/1/2022', periods=100, freq='T'),
   'value': range(100)
})

# Convert 'date' to datetime object, if not already
df['date'] = pd.to_datetime(df['date'])

# Group by 15-minute frequency
custom_df = df.groupby(pd.Grouper(key='date', freq='15T')).sum()

print(custom_df)

Output

                     value
date                      
2022-01-01 00:00:00    105
2022-01-01 00:15:00    330
2022-01-01 00:30:00    555
2022-01-01 00:45:00    780
2022-01-01 01:00:00   1005
2022-01-01 01:15:00   1230
2022-01-01 01:30:00    945

Explanation

The subsequent technique starts with a similar import of the Pandas library as the first, followed by the making of a DataFrame. This DataFrame is identical from the one that was utilized in the past model; the main distinction is that the 'date' column presently contains minute-wise timestamps.

The 'date' column should be a datetime object for the gathering activity to work appropriately, and the pd.to_datetime() function ensures that this will occur.

Within this section, we carry out a grouping operation by making use of a specialized frequency of 15 minutes ('15T') using the pd.Grouper() function that is located inside of the groupby() method. To aggregate the 'value' entries for each 15-minute time interval, we use the sum() function, the same method that was used in the first approach.

The code is completed by displaying the newly grouped DataFrame, which displays the total of the 'value' column for each interval of 15 minutes in time.

Conclusion

Pandas' power extends to a variety of data manipulations, one of which is grouping data by time intervals. By using the groupby() function combined with pd.Grouper, we can effectively segment data based on daily frequency or a custom frequency, allowing for efficient, flexible data analysis.

The capability to group data by time intervals enables analysts and businesses to extract meaningful insights from their data. Whether it's calculating the sum of sales every day, obtaining the average temperature every hour, or counting website hits every 15 minutes, grouping data by time intervals allows us to better understand trends, patterns, and outliers in our data over time.

Remember, Python's Pandas library is a powerful tool for data analysis. Learning how to use its functions, like the groupby method, can help you become a more effective and proficient data analyst or data scientist.

Updated on: 27-Jul-2023

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements