Python Pandas - Manipulating Time-series Data



Pandas provides powerful tools for working with time series data, allowing you to analyze, manipulate, and resample your data efficiently. Time series manipulation methods in Pandas are useful for analyzing and transforming data across different frequencies, filling gaps, and resampling to get insights.

In this tutorial, we will learn about essential time series data manipulating methods, including shifting/lagging, frequency conversion, resampling, upsampling, and sparse resampling.

Shifting and Lagging Time Series Data

To shift or lag values in a time series back and forward in time, you can use the shift() method, which is available on all pandas objects. This method provides a parameter called freq, when it is specified, shift() changes the dates rather than simply moving the values up or down.

Example: Shifting values of time series by period

Here is the basic example of using the shift() method to shift the time series values by specified period.

import pandas as pd

# Creating a sample time series
indx = pd.date_range("2024-11-01", periods=5, freq="D")
ts = pd.Series(range(len(indx)), index=indx )

# Display the input time series
print('Input Time Series:')
print(ts)

# Shifting values of time series in 2 periods
print('\nTime series after shifted by 2 periods')
print(ts.shift(2))

Following is the output of the above code −

Input Time Series:
2024-11-01    0
2024-11-02    1
2024-11-03    2
2024-11-04    3
2024-11-05    4
Freq: D, dtype: int64

Time series after shifted by 2 periods
2024-11-01    NaN
2024-11-02    NaN
2024-11-03    0.0
2024-11-04    1.0
2024-11-05    2.0
Freq: D, dtype: float64

Example: Using freq in shift() method

In the following example we will specify the frequency ("B" - business day) to the shift() method to shift the dates of the time series.

import pandas as pd

# Creating a sample time series
indx = pd.date_range("2024-11-01", periods=5, freq="D")
ts = pd.Series(range(len(indx)), index=indx )

# Display the input time series
print('Input Time Series:')
print(ts)

print('\nTime series after shifted by 3 business days')

# Shift dates by 3 business days
print(ts.shift(3, freq="B"))

Following is the output of the above code −

Input Time Series:
2024-11-01    0
2024-11-02    1
2024-11-03    2
2024-11-04    3
2024-11-05    4
Freq: D, dtype: int64

Time series after shifted by 3 business days
2024-11-06    0
2024-11-06    1
2024-11-06    2
2024-11-07    3
2024-11-08    4
dtype: int64

Frequency Conversion with asfreq()

To convert time series data to a specific frequency, filling gaps with NaN, you can use the asfreq() method.

Example: Basic Example of Converting Frequencies of a Time series

The following example demonstrates the frequency conversion of a time series data to a specific frequency using the asfreq() method.

import pandas as pd
import numpy as np

# Creating a sample time series
indx = pd.date_range("2024-11-01", periods=5, freq="3B")
ts = pd.Series(range(len(indx)), index=indx )

# Display the input time series
print('Input Time Series:')
print(ts)

print('\nTime series after converting the frequency:')

# Convert frequency to daily business days
result = ts.asfreq("B")
print(result)

Following is the output of the above code −

Input Time Series:
2024-11-01    0
2024-11-06    1
2024-11-11    2
2024-11-14    3
2024-11-19    4
Freq: 3B, dtype: int64

Time series after converting the frequency:
2024-11-01    0.0
2024-11-04    NaN
2024-11-05    NaN
2024-11-06    1.0
2024-11-07    NaN
2024-11-08    NaN
2024-11-11    2.0
2024-11-12    NaN
2024-11-13    NaN
2024-11-14    3.0
2024-11-15    NaN
2024-11-18    NaN
2024-11-19    4.0
Freq: B, dtype: float64

Example: Filling Missing Values while Converting Frequencies

For filling missing values while converting frequencies, you can use the method parameter in asfreq(). This will fill the gaps using different interpolation methods, such as forward-fill, backward-fill.

import pandas as pd
import numpy as np

# Creating a sample time series
indx = pd.date_range("2024-11-01", periods=5, freq="3B")
ts = pd.Series(range(len(indx)), index=indx )

# Display the input time series
print('Input Time Series:')
print(ts)

print('\nTime series after converting the frequency:')

# Convert frequency to daily business days
# And forward-filling missing values
result = ts.asfreq("B", method="pad")
print(result)

Following is the output of the above code −

Input Time Series:
2024-11-01    0
2024-11-06    1
2024-11-11    2
2024-11-14    3
2024-11-19    4
Freq: 3B, dtype: int64

Time series after converting the frequency:
2024-11-01    0
2024-11-04    0
2024-11-05    0
2024-11-06    1
2024-11-07    1
2024-11-08    1
2024-11-11    2
2024-11-12    2
2024-11-13    2
2024-11-14    3
2024-11-15    3
2024-11-18    3
2024-11-19    4
Freq: B, dtype: int64

Resampling for Frequency Conversion

Resampling is commonly used operation during frequency conversion (e.g., converting mini data into weekly data), for this Pandas provides a method called resample(). This is a very flexible method and allows you to specify various parameters to control the frequency conversion and resampling operation.

Resampling can be done with any reduction method, such as sum(), mean(), max(), or more complex operations like ohlc().

Example

Following is an example −

import pandas as pd
import numpy as np

# Creating a time series with Day frequency
indx = pd.date_range("2024-11-01", periods=5, freq="D")
ts = pd.Series(range(len(indx)), index=indx )

# Display the input time series
print('Input Time Series:')
print(ts)

# Resampling to Weekly intervals and summing values
result = ts.resample("W").sum()
print('\nResampling to Weekly intervals and summing values:')
print(result)

The output of the code above is as follows −

Input Time Series:
2024-11-01    0
2024-11-02    1
2024-11-03    2
2024-11-04    3
2024-11-05    4
Freq: D, dtype: int64

Resampling to Weekly intervals and summing values:
2024-11-03    3
2024-11-10    7
Freq: W-SUN, dtype: int64

Downsampling for Frequency Conversion

The resample() also offers flexible labeling options for aggregations with parameters like closed and label. For downsampling, the parameter closed can be set to left or right to specify which end of the interval is closed.

Example: Using resample() for Downsampling

The following example demonstrates the use of resample() method for downsampling data.

import pandas as pd
import numpy as np

# Creating a time series with Day frequency
indx = pd.date_range("2024-11-01", periods=5, freq="D")
ts = pd.Series(range(len(indx)), index=indx )

# Display the input time series
print('Input Time Series:')
print(ts)

# Setting the interval to be closed on the right side
result = ts.resample("W", closed="right").mean()
print('\nDownsampled Data:')
print(result)

Below you can is the output of the above code −

Input Time Series:
2024-11-01    0
2024-11-02    1
2024-11-03    2
2024-11-04    3
2024-11-05    4
Freq: D, dtype: int64

Down-sampled Data:
2024-11-03    1.0
2024-11-10    3.5
Freq: W-SUN, dtype: float64

Upsampling and Interpolation

For upsampling (increasing frequency), you can use resample() and asfreq() to interpolate values in newly created gaps.

Example

Here is the example of upsampling of time series data using the resample() and asfreq() methods.

import pandas as pd
import numpy as np

# Creating a time series with Day frequency
indx = pd.date_range("2024-11-01", periods=3, freq="D")
ts = pd.Series(range(len(indx)), index=indx )

# Display the input time series
print('Input Time Series:')
print(ts)

# Upsampling from Days to hours
result = ts[:2].resample("6h").asfreq()

print('\nUpsampled Data:')
print(result)

Following is the output of the above code −

Input Time Series:
2024-11-01    0
2024-11-02    1
2024-11-03    2
Freq: D, dtype: int64

Upsampled Data:
2024-11-01 00:00:00    0.0
2024-11-01 06:00:00    NaN
2024-11-01 12:00:00    NaN
2024-11-01 18:00:00    NaN
2024-11-02 00:00:00    1.0
Freq: 6H, dtype: float64

Sparse Resampling

Sparse resampling avoids creating unnecessary intermediate values, especially useful for time series with infrequent data points. When no filling method is applied, intermediate gaps are filled with NaN.

Example

Here is the example of handling sparse datasets without generating excessive NaN values, using the resample() method.

import pandas as pd
import numpy as np

# Creating a time series with Day frequency
indx = pd.date_range("2024-11-01", periods=3, freq="D") + pd.Timedelta("1s")
ts = pd.Series(range(len(indx)), index=indx )

# Display the input time series
print('Input Time Series:')
print(ts)

# Resampling to 3-minute intervals
result = ts.resample("3min").sum()

print('\nSparse resampling:')
print(result)

Following is the output of the above code −

Input Time Series:
2024-11-01 00:00:01    0
2024-11-02 00:00:01    1
2024-11-03 00:00:01    2
Freq: D, dtype: int64

Sparse resampling:
2024-11-01 00:00:00    0
2024-11-01 00:03:00    0
2024-11-01 00:06:00    0
2024-11-01 00:09:00    0
2024-11-01 00:12:00    0
                      ..
2024-11-02 23:48:00    0
2024-11-02 23:51:00    0
2024-11-02 23:54:00    0
2024-11-02 23:57:00    0
2024-11-03 00:00:00    2
Freq: 3T, Length: 961, dtype: int64
Advertisements