How to Group Pandas DataFrame By Date and Time?


In data analysis and manipulation, it is common to work with data that includes date and time information. One useful operation is grouping data based on dates and times to perform aggregations or extract specific information. In this article, we will investigate how to bunch a Pandas DataFrame by date and time involving the strong capacities of the Pandas library in Python.

Syntax

Prior to plunging into the subtleties, how about we start with the punctuation of the technique we will use in the accompanying code models −

dataframe.groupby(pd.Grouper(key='column_name', freq='frequency')).operation()

Here, dataframe refersto the Pandas DataFrame object, column_name addresses the name of the portion containing date and time information, repeat shows the repeat at which we want to pack the data (e.g., 'D' for everyday, 'M' for month to month, 'H' for hourly), and action() implies the best action to be performed on the assembled data.

Algorithm

Presently, we should stroll through the bit by bit course of collection a Pandas DataFrame by date and time −

  • Import the necessary libraries −

    import pandas as pd − Imports the Pandas library for data manipulation and analysis.

  • Load the data into a Pandas DataFrame −

    dataframe = pd.read_csv('data.csv') − Reads the data from a CSV file and stores it in a DataFrame called dataframe.

  • Convert the date and time column to a datetime data type −

    dataframe['datetime_column'] = pd.to_datetime(dataframe['datetime_column']) − Converts the specified column, datetime_column, to a datetime data type. This step ensures that Pandas recognizes the column as containing dates and times.

  • Group the DataFrame by date and time −

    grouped_data = dataframe.groupby(pd.Grouper(key='datetime_column', freq='frequency')) − Uses the groupby() method with pd.Grouper to group the DataFrame based on the datetime_column and the specified frequency.

  • Perform an operation on the grouped data −

    result = grouped_data.operation() − Applies the desired operation on the grouped data, where operation() can be any Pandas operation or method.

Data.csv

datetime_column,value
2023-07-01 08:00:00,10
2023-07-01 12:00:00,5
2023-07-02 09:00:00,7
2023-07-02 14:00:00,3
2023-07-03 10:00:00,8
2023-07-03 16:00:00,2
2023-07-04 11:00:00,6
2023-07-04 18:00:00,4

Approach 1: Grouping by Daily Frequency

In this approach, we will group the DataFrame by daily frequency, allowing us to perform aggregations or calculations on a daily basis.

Example

import pandas as pd

# Load the data from the CSV file into a Pandas DataFrame
dataframe = pd.read_csv('data.csv')

# Convert the 'datetime_column' to a datetime data type
dataframe['datetime_column'] = pd.to_datetime(dataframe['datetime_column'])

# Group the DataFrame by date and time using daily frequency
grouped_data = dataframe.groupby(pd.Grouper(key='datetime_column', freq='D'))

# Perform an operation on the grouped data (sum the 'value' column)
result = grouped_data['value'].sum()

# Print the result
print(result)

Output

datetime_column
2023-01-05     5
2023-01-06     0
2023-01-07     0
2023-01-08     0
2023-01-09     2
              ..
2023-12-27     0
2023-12-28     3
2023-12-29     0
2023-12-30     0
2023-12-31    16
Freq: D, Name: value, Length: 361, dtype: int64

Explanation

Grouping by Daily Frequency

In this approach, we want to group the data in the DataFrame by daily frequency and calculate the sum of the "value" column for each date.

The code begins by importing the necessary libraries. We import the pandas library utilizing the alias name "pd" to successfully work with DataFrames.

Then, we load the information from the CSV record into a Pandas DataFrame utilizing the pd.read_csv() capability. We expect that the information is put away in a record named 'data.csv'. Change the record way if fundamental.

To work with the date and time data in the DataFrame, we want to change the comparing section over completely to a datetime information type. We utilize the pd.to_datetime() capability and pass the section name, 'datetime_column', to fittingly change over it.

Once the column is converted, we are ready to group the DataFrame by the date using daily frequency. We use the groupby() method on the DataFrame and specify the key as 'datetime_column' and the frequency as 'D' (for daily) using pd.Grouper(key='datetime_column', freq='D').

In the wake of collection the information, we can play out a procedure on the assembled information. For this situation, we need to compute the amount of the "value" section for each date. We determine 'value' as the segment of interest and apply the aggregate() strategy to the gathered information.

At last, we can print the outcome to see the amount of the "value" segment for each date.

Note that you want to change the record way or name in the code to match your particular CSV document. Running this code model ought to give you the ideal result, showing the amount of the "esteem" section for each date in the DataFrame.This approach provides a way to group the DataFrame by daily frequency and perform calculations or aggregations on a daily basis, allowing you to analyze and extract meaningful insights from your data.

Approach 2: Grouping by Hourly Frequency

In this methodology, we will bunch the DataFrame by hourly recurrence, empowering us to break down or control the information on an hourly premise. Here is a model that exhibits how to accomplish this −

Example

import pandas as pd

# Load the data from the CSV file into a Pandas DataFrame
dataframe = pd.read_csv('data.csv')

# Convert the date and time column to a datetime data type
dataframe['datetime_column'] = pd.to_datetime(dataframe['datetime_column'])

# Group the DataFrame by date and time using hourly frequency
grouped_data = dataframe.groupby(pd.Grouper(key='datetime_column', freq='H'))

# Perform an operation on the grouped data
result = grouped_data['value'].mean()

# Print the result
print(result)

Output

datetime_column
2023-01-05 17:00:00    5.0
2023-01-05 18:00:00    NaN
2023-01-05 19:00:00    NaN
2023-01-05 20:00:00    NaN
2023-01-05 21:00:00    NaN
                      ... 
2023-12-31 01:00:00    NaN
2023-12-31 02:00:00    NaN
2023-12-31 03:00:00    NaN
2023-12-31 04:00:00    NaN
2023-12-31 05:00:00    8.0
Freq: H, Name: value, Length: 8629, dtype: float64

Explanation

Grouping by Hourly Frequency

In this approach, we want to group the data in the DataFrame by hourly frequency and calculate the mean of the "value" column for each hour.

The code begins by importing the necessary libraries. We import the pandas library utilizing the alias"pd" to successfully work with DataFrames.

Then, we load the information from the CSV record into a Pandas DataFrame utilizing the pd.read_csv() capability. We expect that the information is put away in a record named 'data.csv'. Change the record way if fundamental.

To work with the date and time data in the DataFrame, we really want to change the relating segment over completely to a datetime information type. We utilize the pd.to_datetime() capability and pass the section name, 'datetime_column', to suitably change over it.

Once the column is converted, we are prepared to bunch the DataFrame continuously utilizing hourly recurrence. We utilize the groupby() technique on the DataFrame and indicate the key as 'datetime_column' and the recurrence as 'H' (for hourly) utilizing pd.Grouper(key='datetime_column', freq='H').

In the wake of collection the information, we can play out a procedure on the gathered information. For this situation, we need to compute the mean of the "value" section for every hour.

We specify 'value' as the column of interest and apply the mean() method to the grouped data.

At last, we can print the outcome to see the mean of the "value" segment for every hour.

Note that you want to change the record way or name in the code to match your particular CSV document. Running this code model ought to give you the ideal result, showing the mean of the "esteem" section for every hour in the DataFrame.

This approach gives a method for gathering the DataFrame by hourly recurrence and performing computations or collections on an hourly premise, permitting you to dissect and extricate significant bits of knowledge from your information.

Conclusion

By utilizing the strong capacities of the Pandas library in Python, we can without much of a stretch gather a Pandas DataFrame by date and time. This usefulness empowers us to perform different collections, computations, or examinations in view of various frequencies like day to day, hourly, month to month, and that's only the tip of the iceberg. The adaptability given by Pandas makes it a priceless device for dealing with and controlling time-series information in a succinct and proficient way. By following the illustrated advances and using the gave linguistic structure, you can now successfully bunch your Pandas DataFrame by date and time to separate significant bits of knowledge from your information.

Updated on: 27-Jul-2023

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements