How to Resample Time Series Data in Python


Time series data is a sequence of observations collected over time at regular intervals. This data can be of any domain such as finance, economics, health, and environmental science. The time series data we collect can sometimes be of different frequencies or resolutions, which may not be suitable for our analysis and data modeling process. In such cases, we can Resample our time series data by changing the frequencies or resolution of the time series by either upsampling or downsampling. This article will explain different methods to upsample or downsample the time series data.

Upsampling

Upsampling means increasing the frequency of the time series data. This is usually done when we need a higher resolution or more frequent observations. Python provides several methods for upsampling time series data, including linear interpolation, nearest neighbor interpolation, and polynomial interpolation.

Syntax

DataFrame.resample(rule, *args, **kwargs)
DataFrame.asfreq(freq, method=None)
DataFrame.interpolate(method='linear', axis=0, limit=None, inplace=False, limit_direction='forward', limit_area=None)

Here,

  • The resample function is a method provided by the pandas library to resample time series data. It is applied on a DataFrame and takes the rule parameter, which specifies the desired frequency for resampling. Additional arguments (*args) and keyword arguments (**kwargs) can be provided to customize the resampling behavior, such as specifying the aggregation method or handling missing values.

  • The asfreq method is used in conjunction with the resample function to convert the frequency of the time series data. It takes the freq parameter, which specifies the desired frequency string for the output. The optional method parameter allows specifying how to handle any missing values introduced during the resampling process, such as forward filling, backward filling, or interpolation.

  • The interpolate method is used to fill missing values or gaps in the time series data. It performs interpolation based on the specified method (e.g., 'linear', 'nearest', 'spline') to estimate the values between existing observations. Additional parameters allow controlling the axis along which interpolation is performed, the limit on consecutive NaN values to be filled, and whether to modify the DataFrame inplace or return a new DataFrame.

Linear Interpolation

Linear interpolation is used for upsampling time series data. It fills the gaps between data points by drawing straight lines between them. The resample function from the pandas library can be used to achieve linear interpolation.

Example

In the below example, we have a time series DataFrame with three observations on non−consecutive dates. We convert the 'Date' column to a datetime format and set it as the index. The resample function is used to upsample the data to a daily frequency ('D') using the asfreq method. Finally, the interpolate method with the 'linear' option fills the gaps between the data points using linear interpolation. The DataFrame, df_upsampled, contains the upsampled time series data with interpolated values.

import pandas as pd

# Create a sample time series DataFrame
data = {'Date': ['2023-06-01', '2023-06-03', '2023-06-06'],
        'Value': [10, 20, 30]}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

# Upsample the data using linear interpolation
df_upsampled = df.resample('D').asfreq().interpolate(method='linear')

# Print the upsampled DataFrame
print(df_upsampled)

Output

                Value
Date                 
2023-06-01  10.000000
2023-06-02  15.000000
2023-06-03  20.000000
2023-06-04  23.333333
2023-06-05  26.666667
2023-06-06  30.000000

Nearest Neighbor Interpolation

Nearest neighbor interpolation is a simple method that fills the gaps between data points with the nearest available observation. This method can be useful when the time series exhibits abrupt changes or when the order of observations matters. The interpolate method in pandas can be used with the 'nearest' option to perform nearest neighbor interpolation.

Example

In the above example, we use the same original DataFrame as before. After resampling with the 'D' frequency, the interpolate method with the 'nearest' option fills the gaps by copying the nearest available observation. The resulting DataFrame, df_upsampled, now has a daily frequency with the nearest neighbor interpolation.

import pandas as pd

# Create a sample time series DataFrame
data = {'Date': ['2023-06-01', '2023-06-03', '2023-06-06'],
        'Value': [10, 20, 30]}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

# Upsample the data using nearest neighbor interpolation
df_upsampled = df.resample('D').asfreq().interpolate(method='nearest')

# Print the upsampled DataFrame
print(df_upsampled)

Output

            Value
Date             
2023-06-01   10.0
2023-06-02   10.0
2023-06-03   20.0
2023-06-04   20.0
2023-06-05   30.0
2023-06-06   30.0

Down Sampling

Downsampling is used to decrease the frequency of the time series data, usually to obtain a broader view of the data or to simplify analysis. Python provides different downsampling techniques, such as taking the mean, sum, or maximum value within a specified time interval.

Syntax

DataFrame.mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Here, Aggregation methods, such as mean, sum, or max, are applied after resampling to calculate a single value representing the grouped observations within each resampled interval. These methods are typically used when downsampling the data. They can be applied directly to the resampled DataFrame or combined with the resample function to aggregate the data based on a specific frequency, such as weekly or monthly, by specifying the appropriate rule.

Mean Downsampling

Mean downsampling calculates the average value of the data points within each interval. This method is useful when dealing with high−frequency data and obtaining a representative value for each interval. The resample function combined with the mean method can be used to perform mean downsampling.

Example

In the below example, we start with a daily time series DataFrame spanning the entire month of June 2023. The resample function with the 'W' frequency downsamples the data to weekly intervals. By applying the mean method, we obtain the average value within each week. The resulting DataFrame, df_downsampled, contains the mean-downsampled time series data.

import pandas as pd

# Create a sample time series DataFrame with daily frequency
data = {'Date': pd.date_range(start='2023-06-01', end='2023-06-30', freq='D'),
        'Value': range(30)}
df = pd.DataFrame(data)
df.set_index('Date', inplace=True)

# Downsampling using mean
df_downsampled = df.resample('W').mean()

# Print the downsampled DataFrame
print(df_downsampled)

Output

            Value
Date             
2023-06-04    1.5
2023-06-11    7.0
2023-06-18   14.0
2023-06-25   21.0
2023-07-02   27.0

Maximum Downsampling

Maximum downsampling calculates and sets the highest value within each interval. This method is suitable for identifying peak values or extreme events in the time series. Using max instead of mean or sum in the previous example allows us to perform maximum downsampling.

Example

In the below example, we start with a daily time series DataFrame spanning the entire month of June 2023. The resample function with the 'W' frequency downsamples the data to weekly intervals. By applying the max method, we obtain the Maximum value within each week. The resulting DataFrame, df_downsampled, contains the maximum-downsampled time series data.

import pandas as pd
# Create a sample time series DataFrame with daily frequency
data = {'Date': pd.date_range(start='2023-06-01', end='2023-06-30', freq='D'),
        'Value': range(30)}
df = pd.DataFrame(data)
df.set_index('Date', inplace=True)

# Downsampling using mean
df_downsampled = df.resample('W').max()

# Print the downsampled DataFrame
print(df_downsampled)

Output

            Value
Date             
2023-06-04      3
2023-06-11     10
2023-06-18     17
2023-06-25     24
2023-07-02     29

Conclusion

In this article, we discussed how we can resample time series data using Python. Python provides various upsampling and downsampling techniques. We explored linear and nearest−neighbor interpolation for upsampling and mean and maximum interpolation for downsampling. You can use any of the upsampling or downsampling technique depending on the problem at hand.

Updated on: 18-Jul-2023

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements