How to Merge “Not Matching” Time Series with Pandas?


Time series data is a crucial part of many business operations, especially those in the finance and manufacturing industries. These datasets often come in multiple tables or files, with each table containing a specific subset of the data. Merging these tables can be a challenging task, mainly when the tables contain data that does not match.

In this article, we will learn how to merge time series data that does not match using Pandas. Pandas is a powerful data analysis library in Python that provides extensive tools for merging and manipulating data. We will also learn the different techniques to merge the time series with their syntax and complete examples.

Approaches

There are many approaches that can be used to merge the not marching time series data with pandas. In this article, we will see some of the common approaches to merging the time series that do not match. Let’s now discuss them in detail.

Method 1: Using Inner Join

The first method to merge the not matching time series data using pandas in Python is by using an inner join. This join returns only the rows that have matching timestamps in both DataFrames. This means that any rows that do not have a corresponding match in the other DataFrame will be excluded from the merged DataFrame.

Syntax

mergedData = pd.merge(dataframe1, dataframe2, on='timestamp', how='inner')

Example

In the given example, we first create two DataFrames dataframe1 and dataframe2 using pd.DataFrame(). We then perform an inner join on the timestamp column using pd.merge(). The resulting DataFrame contains only the rows that have matching timestamps in both DataFrames.

import pandas as pd

# Creating DataFrame one
dataframe1 = pd.DataFrame({'timestamp': ['2023-05-10 13:00:00', '2023-05-10 14:00:00', '2023-05-10 15:00:00', '2023-05-10 16:00:00'],
   'stock': [25, 26, 28, 29]})

# Creating DataFrame two
dataframe2 = pd.DataFrame({'timestamp': ['2023-05-10 14:00:00', '2023-05-10 15:00:00', '2023-05-10 16:00:00', '2023-05-10 17:00:00'],
   'price': [40, 50, 60, 70]})

# Perform inner join operation to merge not matching time series
mergedData = pd.merge(dataframe1, dataframe2, on='timestamp', how='inner')

print(mergedData)

Output

             timestamp  stock  price
0  2023-05-10 14:00:00     26     40
1  2023-05-10 15:00:00     28     50
2  2023-05-10 16:00:00     29     60

Method 2: Using Outer Join

The second method to merge the not matching time series data using pandas in Python is by using an outer join. This join returns all rows from both DataFrames, regardless of whether there is a matching timestamp in the other DataFrame. If a row does not have a corresponding match in the other DataFrame, Pandas will fill in missing values with NaN (not a number).

Syntax

mergedData = pd.merge(dataframe1, dataframe2, on='timestamp', how='outer)

Example

In the given example, we first create two DataFrames dataframe1 and dataframe2 using pd.DataFrame(). We then perform an outer join on the timestamp column using pd.merge(). The resulting DataFrame contains all the rows from both DataFrames, with missing values (NaN) in places where there were no matches.

import pandas as pd

# Creating DataFrame one
dataframe1 = pd.DataFrame({'timestamp': ['2023-05-10 13:00:00', '2023-05-10 14:00:00', '2023-05-10 15:00:00', '2023-05-10 16:00:00'],
   'stock': [25, 26, 28, 29]})

# Creating DataFrame two
dataframe2 = pd.DataFrame({'timestamp': ['2023-05-10 14:00:00', '2023-05-10 15:00:00', '2023-05-10 16:00:00', '2023-05-10 17:00:00'],
   'price': [30, 40, 50, 60]})

# Perform left join operation to merge not matching time series
mergedData = pd.merge(dataframe1, dataframe2, on='timestamp', how='outer')

print(mergedData)

Output

             timestamp  stock  price
0  2023-05-10 13:00:00   25.0    NaN
1  2023-05-10 14:00:00   26.0   30.0
2  2023-05-10 15:00:00   28.0   40.0
3  2023-05-10 16:00:00   29.0   50.0
4  2023-05-10 17:00:00    NaN   60.0

Method 3: Using Left Join

The third method to merge the not matching time series data using pandas in Python is by using a left join. This join returns all the rows from the left DataFrame and only the matching rows from the right DataFrame. If a row does not have a corresponding match in the right DataFrame, Pandas will fill in missing values with NaN.

Syntax

mergedData = pd.merge(dataframe1, dataframe2, on='timestamp', how='left')

Example

In the below example, we first create two DataFrames dataframe1 and dataframe2 using pd.DataFrame(). We then perform a left join on the timestamp column using pd.merge(). The resulting DataFrame contains all the rows from dataframe1 and the matching rows from dataframe2, with missing values (NaN) in places where there were no matches.

import pandas as pd

# Creating DataFrame one
dataframe1 = pd.DataFrame({'timestamp': ['2023-05-10 13:00:00', '2023-05-10 14:00:00', '2023-05-10 15:00:00', '2023-05-10 16:00:00'],
   'stock': [25, 26, 28, 29]})

# Creating DataFrame two
dataframe2 = pd.DataFrame({'timestamp': ['2023-05-10 14:00:00', '2023-05-10 15:00:00', '2023-05-10 16:00:00', '2023-05-10 17:00:00'],
   'price': [30, 40, 50, 60]})

# Perform left join operation to merge not matching time series
mergedData = pd.merge(dataframe1, dataframe2, on='timestamp', how='left')

print(mergedData)

Output

             timestamp  stock  price
0  2023-05-10 13:00:00     25    NaN
1  2023-05-10 14:00:00     26   30.0
2  2023-05-10 15:00:00     28   40.0
3  2023-05-10 16:00:00     29   50.0

Method 4: Using Merge_asof() Function

The last method to merge the not matching time series data using pandas in Python is by using the merge_asof() function. This function in Pandas is also used to merge two time-series dataframes, where we want to match the right dataframe values with the nearest previous values in the left dataframe.

Syntax

mergedData = pd.merge_asof(df, dataframe1, on='time', direction='backward')

Example

In the below example, we first create two dataframes df and dataframe1 using pd.DataFrame(). We then convert the time column in both dataframes to datetime format using pd.to_datetime(). Finally, we merge the two dataframes using pd.merge_asof() and specify the direction parameter as 'backward' to indicate that we want to match the right dataframe values with the nearest previous values in the left dataframe.

import pandas as pd

# Creating DataFrame 1: Course price data
df = pd.DataFrame({'time': pd.date_range('2023-05-10', periods=10, freq='1h'),
   'price': [100, 102, 105, 104, 107, 109, 111, 110, 112, 115]})

# Creating DataFrame 2: Course Launch events data
dataframe1 = pd.DataFrame({'time': ['2023-05-10 03:00:00', '2023-05-10 07:30:00', '2023-05-10 09:45:00', '2023-05-10 12:15:00'],
   'news': ['Newly added', 'Most demanded course', 'Best Seller', 'Developers choice']})

# Converting the 'time' column in both dataframes to datetime format
df['time'] = pd.to_datetime(df['time'])
dataframe1['time'] = pd.to_datetime(dataframe1['time'])

# Merging both the dataframes using merge_asof() function using backward approach
mergedData = pd.merge_asof(df, dataframe1, on='time', direction='backward')

print(mergedData)

Output

                 time  price                  news
0 2023-05-10 00:00:00    100                   NaN
1 2023-05-10 01:00:00    102                   NaN
2 2023-05-10 02:00:00    105                   NaN
3 2023-05-10 03:00:00    104           Newly added
4 2023-05-10 04:00:00    107           Newly added
5 2023-05-10 05:00:00    109           Newly added
6 2023-05-10 06:00:00    111           Newly added
7 2023-05-10 07:00:00    110           Newly added
8 2023-05-10 08:00:00    112  Most demanded course
9 2023-05-

Updated on: 31-Jul-2023

723 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements