How to Resample Time Series Data in Python

Time series data is a sequence of observations collected over time at regular intervals. This data can be of any domain such as finance, economics, health, and environmental science. The time series data we collect can sometimes be of different frequencies or resolutions, which may not be suitable for our analysis and data modeling process. In such cases, we can resample our time series data by changing the frequencies or resolution through either upsampling or downsampling.

Understanding Resampling

Resampling involves changing the frequency of time series observations. Upsampling increases frequency (daily to hourly), while downsampling decreases frequency (daily to weekly).

Upsampling Methods

Upsampling increases the frequency of time series data, creating a higher resolution with more frequent observations. Python provides several interpolation methods for filling the gaps created during upsampling.

Syntax

DataFrame.resample(rule).asfreq()
DataFrame.interpolate(method='linear')

Linear Interpolation

Linear interpolation fills gaps between data points by drawing straight lines between existing observations. This method assumes a constant rate of change between points.

import pandas as pd

# Create a sample time series DataFrame
data = {'Date': ['2023-06-01', '2023-06-03', '2023-06-06'],
        'Value': [10, 20, 30]}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

print("Original data:")
print(df)
print("\nUpsampled with linear interpolation:")

# Upsample to daily frequency and apply linear interpolation
df_upsampled = df.resample('D').asfreq().interpolate(method='linear')
print(df_upsampled)
Original data:
            Value
Date             
2023-06-01     10
2023-06-03     20
2023-06-06     30

Upsampled with linear interpolation:
                Value
Date                 
2023-06-01  10.000000
2023-06-02  15.000000
2023-06-03  20.000000
2023-06-04  23.333333
2023-06-05  26.666667
2023-06-06  30.000000

Nearest Neighbor Interpolation

Nearest neighbor interpolation fills gaps by copying the nearest available observation. This method is useful for categorical data or when maintaining original values is important.

import pandas as pd

# Create the same sample time series DataFrame
data = {'Date': ['2023-06-01', '2023-06-03', '2023-06-06'],
        'Value': [10, 20, 30]}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

# Upsample using nearest neighbor interpolation
df_upsampled = df.resample('D').asfreq().interpolate(method='nearest')

print("Upsampled with nearest neighbor interpolation:")
print(df_upsampled)
Upsampled with nearest neighbor interpolation:
            Value
Date             
2023-06-01   10.0
2023-06-02   10.0
2023-06-03   20.0
2023-06-04   20.0
2023-06-05   30.0
2023-06-06   30.0

Downsampling Methods

Downsampling decreases the frequency of time series data by aggregating multiple observations into single values. Common aggregation methods include mean, sum, and maximum.

Mean Downsampling

Mean downsampling calculates the average value within each time interval, providing a smooth representation of the data trends.

import pandas as pd

# Create a daily time series for the entire month of June 2023
data = {'Date': pd.date_range(start='2023-06-01', end='2023-06-30', freq='D'),
        'Value': range(30)}
df = pd.DataFrame(data)
df.set_index('Date', inplace=True)

print("Original daily data (first 7 days):")
print(df.head(7))

# Downsample to weekly frequency using mean
df_downsampled = df.resample('W').mean()

print("\nDownsampled to weekly using mean:")
print(df_downsampled)
Original daily data (first 7 days):
            Value
Date             
2023-06-01      0
2023-06-02      1
2023-06-03      2
2023-06-04      3
2023-06-05      4
2023-06-06      5
2023-06-07      6

Downsampled to weekly using mean:
            Value
Date             
2023-06-04    1.5
2023-06-11    7.0
2023-06-18   14.0
2023-06-25   21.0
2023-07-02   27.0

Maximum Downsampling

Maximum downsampling captures the highest value within each interval, useful for identifying peak values or extreme events.

import pandas as pd

# Create the same daily time series
data = {'Date': pd.date_range(start='2023-06-01', end='2023-06-30', freq='D'),
        'Value': range(30)}
df = pd.DataFrame(data)
df.set_index('Date', inplace=True)

# Downsample to weekly frequency using maximum
df_downsampled = df.resample('W').max()

print("Downsampled to weekly using maximum:")
print(df_downsampled)
Downsampled to weekly using maximum:
            Value
Date             
2023-06-04      3
2023-06-11     10
2023-06-18     17
2023-06-25     24
2023-07-02     29

Comparison of Methods

Method Type Best For Preserves
Linear Interpolation Upsampling Smooth trends Trend continuity
Nearest Neighbor Upsampling Categorical/discrete data Original values
Mean Aggregation Downsampling Overall trends Average behavior
Max Aggregation Downsampling Peak detection Extreme values

Conclusion

Resampling is essential for time series analysis, allowing you to adjust data frequency through upsampling and downsampling. Choose linear interpolation for smooth trends, nearest neighbor for preserving original values, and appropriate aggregation methods based on your analytical needs.

Updated on: 2026-03-27T08:36:46+05:30

4K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements