- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
Python Pandas - Manipulating Time-series Data
Pandas provides powerful tools for working with time series data, allowing you to analyze, manipulate, and resample your data efficiently. Time series manipulation methods in Pandas are useful for analyzing and transforming data across different frequencies, filling gaps, and resampling to get insights.
In this tutorial, we will learn about essential time series data manipulating methods, including shifting/lagging, frequency conversion, resampling, upsampling, and sparse resampling.
Shifting and Lagging Time Series Data
To shift or lag values in a time series back and forward in time, you can use the shift() method, which is available on all pandas objects. This method provides a parameter called freq, when it is specified, shift() changes the dates rather than simply moving the values up or down.
Example: Shifting values of time series by period
Here is the basic example of using the shift() method to shift the time series values by specified period.
import pandas as pd
# Creating a sample time series
indx = pd.date_range("2024-11-01", periods=5, freq="D")
ts = pd.Series(range(len(indx)), index=indx )
# Display the input time series
print('Input Time Series:')
print(ts)
# Shifting values of time series in 2 periods
print('\nTime series after shifted by 2 periods')
print(ts.shift(2))
Following is the output of the above code −
Input Time Series: 2024-11-01 0 2024-11-02 1 2024-11-03 2 2024-11-04 3 2024-11-05 4 Freq: D, dtype: int64 Time series after shifted by 2 periods 2024-11-01 NaN 2024-11-02 NaN 2024-11-03 0.0 2024-11-04 1.0 2024-11-05 2.0 Freq: D, dtype: float64
Example: Using freq in shift() method
In the following example we will specify the frequency ("B" - business day) to the shift() method to shift the dates of the time series.
import pandas as pd
# Creating a sample time series
indx = pd.date_range("2024-11-01", periods=5, freq="D")
ts = pd.Series(range(len(indx)), index=indx )
# Display the input time series
print('Input Time Series:')
print(ts)
print('\nTime series after shifted by 3 business days')
# Shift dates by 3 business days
print(ts.shift(3, freq="B"))
Following is the output of the above code −
Input Time Series: 2024-11-01 0 2024-11-02 1 2024-11-03 2 2024-11-04 3 2024-11-05 4 Freq: D, dtype: int64 Time series after shifted by 3 business days 2024-11-06 0 2024-11-06 1 2024-11-06 2 2024-11-07 3 2024-11-08 4 dtype: int64
Frequency Conversion with asfreq()
To convert time series data to a specific frequency, filling gaps with NaN, you can use the asfreq() method.
Example: Basic Example of Converting Frequencies of a Time series
The following example demonstrates the frequency conversion of a time series data to a specific frequency using the asfreq() method.
import pandas as pd
import numpy as np
# Creating a sample time series
indx = pd.date_range("2024-11-01", periods=5, freq="3B")
ts = pd.Series(range(len(indx)), index=indx )
# Display the input time series
print('Input Time Series:')
print(ts)
print('\nTime series after converting the frequency:')
# Convert frequency to daily business days
result = ts.asfreq("B")
print(result)
Following is the output of the above code −
Input Time Series: 2024-11-01 0 2024-11-06 1 2024-11-11 2 2024-11-14 3 2024-11-19 4 Freq: 3B, dtype: int64 Time series after converting the frequency: 2024-11-01 0.0 2024-11-04 NaN 2024-11-05 NaN 2024-11-06 1.0 2024-11-07 NaN 2024-11-08 NaN 2024-11-11 2.0 2024-11-12 NaN 2024-11-13 NaN 2024-11-14 3.0 2024-11-15 NaN 2024-11-18 NaN 2024-11-19 4.0 Freq: B, dtype: float64
Example: Filling Missing Values while Converting Frequencies
For filling missing values while converting frequencies, you can use the method parameter in asfreq(). This will fill the gaps using different interpolation methods, such as forward-fill, backward-fill.
import pandas as pd
import numpy as np
# Creating a sample time series
indx = pd.date_range("2024-11-01", periods=5, freq="3B")
ts = pd.Series(range(len(indx)), index=indx )
# Display the input time series
print('Input Time Series:')
print(ts)
print('\nTime series after converting the frequency:')
# Convert frequency to daily business days
# And forward-filling missing values
result = ts.asfreq("B", method="pad")
print(result)
Following is the output of the above code −
Input Time Series: 2024-11-01 0 2024-11-06 1 2024-11-11 2 2024-11-14 3 2024-11-19 4 Freq: 3B, dtype: int64 Time series after converting the frequency: 2024-11-01 0 2024-11-04 0 2024-11-05 0 2024-11-06 1 2024-11-07 1 2024-11-08 1 2024-11-11 2 2024-11-12 2 2024-11-13 2 2024-11-14 3 2024-11-15 3 2024-11-18 3 2024-11-19 4 Freq: B, dtype: int64
Resampling for Frequency Conversion
Resampling is commonly used operation during frequency conversion (e.g., converting mini data into weekly data), for this Pandas provides a method called resample(). This is a very flexible method and allows you to specify various parameters to control the frequency conversion and resampling operation.
Resampling can be done with any reduction method, such as sum(), mean(), max(), or more complex operations like ohlc().
Example
Following is an example −
import pandas as pd
import numpy as np
# Creating a time series with Day frequency
indx = pd.date_range("2024-11-01", periods=5, freq="D")
ts = pd.Series(range(len(indx)), index=indx )
# Display the input time series
print('Input Time Series:')
print(ts)
# Resampling to Weekly intervals and summing values
result = ts.resample("W").sum()
print('\nResampling to Weekly intervals and summing values:')
print(result)
The output of the code above is as follows −
Input Time Series: 2024-11-01 0 2024-11-02 1 2024-11-03 2 2024-11-04 3 2024-11-05 4 Freq: D, dtype: int64 Resampling to Weekly intervals and summing values: 2024-11-03 3 2024-11-10 7 Freq: W-SUN, dtype: int64
Downsampling for Frequency Conversion
The resample() also offers flexible labeling options for aggregations with parameters like closed and label. For downsampling, the parameter closed can be set to left or right to specify which end of the interval is closed.
Example: Using resample() for Downsampling
The following example demonstrates the use of resample() method for downsampling data.
import pandas as pd
import numpy as np
# Creating a time series with Day frequency
indx = pd.date_range("2024-11-01", periods=5, freq="D")
ts = pd.Series(range(len(indx)), index=indx )
# Display the input time series
print('Input Time Series:')
print(ts)
# Setting the interval to be closed on the right side
result = ts.resample("W", closed="right").mean()
print('\nDownsampled Data:')
print(result)
Below you can is the output of the above code −
Input Time Series: 2024-11-01 0 2024-11-02 1 2024-11-03 2 2024-11-04 3 2024-11-05 4 Freq: D, dtype: int64 Down-sampled Data: 2024-11-03 1.0 2024-11-10 3.5 Freq: W-SUN, dtype: float64
Upsampling and Interpolation
For upsampling (increasing frequency), you can use resample() and asfreq() to interpolate values in newly created gaps.
Example
Here is the example of upsampling of time series data using the resample() and asfreq() methods.
import pandas as pd
import numpy as np
# Creating a time series with Day frequency
indx = pd.date_range("2024-11-01", periods=3, freq="D")
ts = pd.Series(range(len(indx)), index=indx )
# Display the input time series
print('Input Time Series:')
print(ts)
# Upsampling from Days to hours
result = ts[:2].resample("6h").asfreq()
print('\nUpsampled Data:')
print(result)
Following is the output of the above code −
Input Time Series: 2024-11-01 0 2024-11-02 1 2024-11-03 2 Freq: D, dtype: int64 Upsampled Data: 2024-11-01 00:00:00 0.0 2024-11-01 06:00:00 NaN 2024-11-01 12:00:00 NaN 2024-11-01 18:00:00 NaN 2024-11-02 00:00:00 1.0 Freq: 6H, dtype: float64
Sparse Resampling
Sparse resampling avoids creating unnecessary intermediate values, especially useful for time series with infrequent data points. When no filling method is applied, intermediate gaps are filled with NaN.
Example
Here is the example of handling sparse datasets without generating excessive NaN values, using the resample() method.
import pandas as pd
import numpy as np
# Creating a time series with Day frequency
indx = pd.date_range("2024-11-01", periods=3, freq="D") + pd.Timedelta("1s")
ts = pd.Series(range(len(indx)), index=indx )
# Display the input time series
print('Input Time Series:')
print(ts)
# Resampling to 3-minute intervals
result = ts.resample("3min").sum()
print('\nSparse resampling:')
print(result)
Following is the output of the above code −
Input Time Series:
2024-11-01 00:00:01 0
2024-11-02 00:00:01 1
2024-11-03 00:00:01 2
Freq: D, dtype: int64
Sparse resampling:
2024-11-01 00:00:00 0
2024-11-01 00:03:00 0
2024-11-01 00:06:00 0
2024-11-01 00:09:00 0
2024-11-01 00:12:00 0
..
2024-11-02 23:48:00 0
2024-11-02 23:51:00 0
2024-11-02 23:54:00 0
2024-11-02 23:57:00 0
2024-11-03 00:00:00 2
Freq: 3T, Length: 961, dtype: int64