Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to Merge “Not Matching” Time Series with Pandas?
Time series data is crucial for many business operations, especially in finance and manufacturing. These datasets often come in multiple tables with specific subsets of data. Merging tables with non-matching timestamps can be challenging, but Pandas provides powerful tools to handle this task effectively.
In this article, we will learn how to merge time series data with non-matching timestamps using Pandas. We'll explore different merge techniques including inner join, outer join, left join, and the specialized merge_asof() function for time-series data.
Using Inner Join
An inner join returns only rows with matching timestamps in both DataFrames. Any rows without corresponding matches are excluded from the merged result.
Syntax
merged_data = pd.merge(df1, df2, on='timestamp', how='inner')
Example
Here we create two DataFrames and perform an inner join on the timestamp column ?
import pandas as pd
# Creating first DataFrame
df1 = pd.DataFrame({
'timestamp': ['2023-05-10 13:00:00', '2023-05-10 14:00:00',
'2023-05-10 15:00:00', '2023-05-10 16:00:00'],
'stock': [25, 26, 28, 29]
})
# Creating second DataFrame
df2 = pd.DataFrame({
'timestamp': ['2023-05-10 14:00:00', '2023-05-10 15:00:00',
'2023-05-10 16:00:00', '2023-05-10 17:00:00'],
'price': [40, 50, 60, 70]
})
# Perform inner join to merge matching timestamps only
merged_data = pd.merge(df1, df2, on='timestamp', how='inner')
print(merged_data)
timestamp stock price
0 2023-05-10 14:00:00 26 40
1 2023-05-10 15:00:00 28 50
2 2023-05-10 16:00:00 29 60
Using Outer Join
An outer join returns all rows from both DataFrames. Missing values are filled with NaN where timestamps don't match.
Syntax
merged_data = pd.merge(df1, df2, on='timestamp', how='outer')
Example
The outer join preserves all timestamps from both DataFrames ?
import pandas as pd
# Creating first DataFrame
df1 = pd.DataFrame({
'timestamp': ['2023-05-10 13:00:00', '2023-05-10 14:00:00',
'2023-05-10 15:00:00', '2023-05-10 16:00:00'],
'stock': [25, 26, 28, 29]
})
# Creating second DataFrame
df2 = pd.DataFrame({
'timestamp': ['2023-05-10 14:00:00', '2023-05-10 15:00:00',
'2023-05-10 16:00:00', '2023-05-10 17:00:00'],
'price': [30, 40, 50, 60]
})
# Perform outer join to include all timestamps
merged_data = pd.merge(df1, df2, on='timestamp', how='outer')
print(merged_data)
timestamp stock price
0 2023-05-10 13:00:00 25.0 NaN
1 2023-05-10 14:00:00 26.0 30.0
2 2023-05-10 15:00:00 28.0 40.0
3 2023-05-10 16:00:00 29.0 50.0
4 2023-05-10 17:00:00 NaN 60.0
Using Left Join
A left join returns all rows from the left DataFrame and matching rows from the right DataFrame. Non-matching rows from the right are filled with NaN.
Example
The left join preserves all timestamps from the first DataFrame ?
import pandas as pd
# Creating first DataFrame
df1 = pd.DataFrame({
'timestamp': ['2023-05-10 13:00:00', '2023-05-10 14:00:00',
'2023-05-10 15:00:00', '2023-05-10 16:00:00'],
'stock': [25, 26, 28, 29]
})
# Creating second DataFrame
df2 = pd.DataFrame({
'timestamp': ['2023-05-10 14:00:00', '2023-05-10 15:00:00',
'2023-05-10 16:00:00', '2023-05-10 17:00:00'],
'price': [30, 40, 50, 60]
})
# Perform left join to keep all timestamps from df1
merged_data = pd.merge(df1, df2, on='timestamp', how='left')
print(merged_data)
timestamp stock price
0 2023-05-10 13:00:00 25 NaN
1 2023-05-10 14:00:00 26 30.0
2 2023-05-10 15:00:00 28 40.0
3 2023-05-10 16:00:00 29 50.0
Using merge_asof() Function
The merge_asof() function performs time-series specific merging by matching values with the nearest previous timestamps. This is particularly useful for financial data where you want to match events with the most recent prior data point.
Example
Here we merge price data with news events using backward direction matching ?
import pandas as pd
# Creating price DataFrame
price_df = pd.DataFrame({
'time': pd.date_range('2023-05-10', periods=10, freq='1h'),
'price': [100, 102, 105, 104, 107, 109, 111, 110, 112, 115]
})
# Creating news events DataFrame
news_df = pd.DataFrame({
'time': ['2023-05-10 03:00:00', '2023-05-10 07:30:00',
'2023-05-10 09:45:00', '2023-05-10 12:15:00'],
'news': ['Newly added', 'Most demanded course', 'Best Seller', 'Developers choice']
})
# Convert time columns to datetime
price_df['time'] = pd.to_datetime(price_df['time'])
news_df['time'] = pd.to_datetime(news_df['time'])
# Merge using backward direction matching
merged_data = pd.merge_asof(price_df, news_df, on='time', direction='backward')
print(merged_data)
time price news
0 2023-05-10 00:00:00 100 NaN
1 2023-05-10 01:00:00 102 NaN
2 2023-05-10 02:00:00 105 NaN
3 2023-05-10 03:00:00 104 Newly added
4 2023-05-10 04:00:00 107 Newly added
5 2023-05-10 05:00:00 109 Newly added
6 2023-05-10 06:00:00 111 Newly added
7 2023-05-10 07:00:00 110 Newly added
8 2023-05-10 08:00:00 112 Most demanded course
9 2023-05-10 09:00:00 115 Most demanded course
Comparison
| Method | Returns | Best For |
|---|---|---|
| Inner Join | Only matching timestamps | When you need complete data pairs |
| Outer Join | All timestamps from both DataFrames | Comprehensive time series analysis |
| Left Join | All from left + matching from right | Preserving primary dataset structure |
| merge_asof() | Nearest timestamp matches | Financial/event-driven time series |
Conclusion
Use inner join for exact timestamp matches, outer join for comprehensive analysis, left join to preserve your primary dataset, and merge_asof() for time-series specific merging with nearest neighbor matching. Choose the method based on your specific data requirements and analysis goals.
