Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to Merge Two Pandas DataFrames on Index?
Merging two Pandas DataFrames on index is essential when you have datasets sharing common row identifiers but containing different columns. This operation combines data horizontally, creating a unified dataset for analysis.
In this article, we will explore three methods to merge Pandas DataFrames based on their index: merge(), join(), and concat().
What is a Pandas DataFrame?
A DataFrame is Pandas' primary two-dimensional data structure, similar to a spreadsheet or SQL table. It consists of labeled rows and columns where each column can contain different data types (integers, floats, strings, etc.). The row labels form the index, while column labels are the column names.
Method 1: Using merge() Function
The merge() function combines DataFrames using left_index=True and right_index=True parameters to merge on index ?
import pandas as pd
# Create two DataFrames with same index
df1 = pd.DataFrame({'A': [10, 20, 30], 'B': [40, 50, 60]}, index=['x', 'y', 'z'])
df2 = pd.DataFrame({'C': [70, 80, 90], 'D': [100, 110, 120]}, index=['x', 'y', 'z'])
print("DataFrame 1:")
print(df1)
print("\nDataFrame 2:")
print(df2)
# Merge on index
merged_df = pd.merge(df1, df2, left_index=True, right_index=True)
print("\nMerged DataFrame:")
print(merged_df)
DataFrame 1:
A B
x 10 40
y 20 50
z 30 60
DataFrame 2:
C D
x 70 100
y 80 110
z 90 120
Merged DataFrame:
A B C D
x 10 40 70 100
y 20 50 80 110
z 30 60 90 120
Method 2: Using join() Function
The join() function is specifically designed for index-based merging and provides a more concise syntax ?
import pandas as pd
# Create DataFrames with different indexes
df1 = pd.DataFrame({'Sales': [100, 200, 150]}, index=['Jan', 'Feb', 'Mar'])
df2 = pd.DataFrame({'Profit': [20, 40, 30]}, index=['Jan', 'Feb', 'Apr'])
print("Sales DataFrame:")
print(df1)
print("\nProfit DataFrame:")
print(df2)
# Join using outer join to include all indexes
joined_df = df1.join(df2, how='outer')
print("\nJoined DataFrame (outer join):")
print(joined_df)
Sales DataFrame:
Sales
Jan 100
Feb 200
Mar 150
Profit DataFrame:
Profit
Jan 20
Feb 40
Apr 30
Joined DataFrame (outer join):
Sales Profit
Jan 100.0 20.0
Feb 200.0 40.0
Mar 150.0 NaN
Apr NaN 30.0
Method 3: Using concat() Function
The concat() function with axis=1 concatenates DataFrames horizontally along the index ?
import pandas as pd
# Create sample DataFrames
df1 = pd.DataFrame({'Temperature': [25, 30, 28]}, index=['Mon', 'Tue', 'Wed'])
df2 = pd.DataFrame({'Humidity': [60, 65, 70]}, index=['Mon', 'Tue', 'Wed'])
print("Temperature DataFrame:")
print(df1)
print("\nHumidity DataFrame:")
print(df2)
# Concatenate horizontally
concat_df = pd.concat([df1, df2], axis=1)
print("\nConcatenated DataFrame:")
print(concat_df)
Temperature DataFrame:
Temperature
Mon 25
Tue 30
Wed 28
Humidity DataFrame:
Humidity
Mon 60
Tue 65
Wed 70
Concatenated DataFrame:
Temperature Humidity
Mon 25 60
Tue 30 65
Wed 28 70
Comparison of Methods
| Method | Best For | Handles Different Indexes | Join Types |
|---|---|---|---|
merge() |
Complex merging with multiple options | Yes | inner, outer, left, right |
join() |
Simple index-based joining | Yes | left, right, outer, inner |
concat() |
Combining multiple DataFrames | Yes | outer, inner |
Conclusion
Use join() for simple index-based merging, merge() for complex operations with multiple parameters, and concat() when combining multiple DataFrames. All methods effectively combine DataFrames horizontally based on their index values.
