How to Merge Two Pandas DataFrames on Index?


Merging two Pandas DataFrames on index can be useful in many data analysis scenarios. For instance, you might have two datasets with different features or data points, but both share a common index. In this case, merging the two DataFrames can help you combine the data in a meaningful way.

In this article, we will learn how to merge two Pandas DataFrames based on an index in Python. We will go through the complete steps involved in the merging process and illustrate each step with code examples.

What is DataFrames in Pandas?

One of Pandas library's most crucial data structures is DataFrames. Similar to a spreadsheet, SQL table, or a dictionary of Series objects, it is a labeled two-dimensional data structure with columns of potentially different types. It is widely utilized in data manipulation, data cleaning, and analysis as the primary Pandas data structure.

A DataFrame comprises of lines and sections, where every segment can have an alternate information type, (for example, int, float, string, and so on.). The data can be easily accessed and manipulated because each row and column is labeled. The line names are alluded to as the file, while the segment marks are alluded to as the sections.

Steps to Merge two Data Frames in Pandas in Python

Step 1: Import the Modules

The first step to merge two data frames using pandas in Python is to import the required modules like pd. In this example, we are going to use the Pandas for data handling and merging, and NumPy for some operations. Below is the syntax for importing the modules −

import pandas as pd
import numpy as np

Step 2: Creating Sample Data Frames

The next step is to create some sample data frames to merge them using pandas. For the purpose of this example, we will create two sample DataFrames with random data in which both the DataFrames will have the same index, but different columns. Below is the syntax for creating the sample data frames −

# Creating two DataFrames having the same index
mydf1 = pd.DataFrame({'First': [10, 20, 30], 'Second': [40, 50, 60]}, index=['a', 'b', 'c'])
mydf2 = pd.DataFrame({'Third': [70, 80, 90], 'Four': [100, 110, 120]}, index=['a', 'b', 'c'])

Step 3: Merging Based on an Index

Our next step is to merge the two DataFrames on the index, and for that we will use the merge() function with the left_index and right_index parameters set to True. This will ensure that the merge is done based on the index of the DataFrames.

# Merge DataFrames on index
merged_df = pd.merge(df1, df2, left_index=True, right_index=True)
print(merged_df)

In the above merged DataFrame, we can see that the columns from both DataFrames have been combined, and the data points have been matched based on the common index.

Step 4: Merge DataFrames with Different Index

If the two DataFrames have different indexes, we can still merge them on index by using the join() function. We can specify the how parameter as outer to include all rows from both DataFrames, and the on parameter as index to merge on the index.

# Creating two DataFrames having the same index
mydf1 = pd.DataFrame({'First': [10, 20, 30], 'Second': [40, 50, 60]}, index=['a', 'b', 'c'])
mydf2 = pd.DataFrame({'Third': [70, 80, 90], 'Four': [100, 110, 120]}, index=['a', 'b', 'c'])

# Merging the DataFrames on index using join() function
mymerged_df = mydf1.join(mydf2)

# Print the merged DataFrame
print(mymerged_df)

And that’s all! Now, we will see some examples merging the two pandas data frames based on an index with different approaches.

Example 1: Using the Merge() Function

In the given example, we have used the merge() function to merge the two data frames on the index in Python. Here, we have created two DataFrames mydf1 and mydf2 with the same index. We then used the merge() function to merge the DataFrames on their index. After merging both data frames, we saved the resulting DataFrame as mymerged_df contains columns from both DataFrames, combined based on their index.

import pandas as pd

# Creating two DataFrames having the same index
mydf1 = pd.DataFrame({'First': [10, 20, 30], 'Second': [40, 50, 60]}, index=['a', 'b', 'c'])
mydf2 = pd.DataFrame({'Third': [70, 80, 90], 'Four': [100, 110, 120]}, index=['a', 'b', 'c'])

# Merging the DataFrames on index using merge() function
mymerged_df = pd.merge(mydf1, mydf2, left_index=True, right_index=True)

# Print the merged DataFrame
print(mymerged_df)

Output

   First  Second  Third  Four
a     10      40     70   100
b     20      50     80   110
c     30      60     90   120

Example 2: Using the Join() Function

In the given example, we have used the join() function to merge the two data frames on the index in Python. Here, we have created two DataFrames mydf1 and mydf2 with different indexes. We then used the join() function to merge the DataFrames on their index. After merging both data frames, we saved the resulting DataFrame as mymerged_df contains columns from both DataFrames, combined based on their index. However, the rows that do not exist in both DataFrames have NaN values for the missing columns.

import pandas as pd

# Creating two DataFrames having the same index
mydf1 = pd.DataFrame({'First': [10, 20, 30], 'Second': [40, 50, 60]}, index=['a', 'b', 'c'])
mydf2 = pd.DataFrame({'Third': [70, 80, 90], 'Four': [100, 110, 120]}, index=['a', 'b', 'c'])

# Merging the DataFrames on index using join() function
mymerged_df = mydf1.join(mydf2)

# Print the merged DataFrame
print(mymerged_df)

Output

   First  Second  Third  Four
a     10      40     70   100
b     20      50     80   110
c     30      60     90   120

Example 3: Using the Concat() Function

In the given example, we have used the concat() function to merge the two data frames on the index in Python. Here, we used the concat() function to merge the DataFrames. By specifying axis=1, we concatenated the DataFrames horizontally, combining the columns. The resulting DataFrame mymerged_df contains columns from both DataFrames, combined based on their index.

Example

import pandas as pd

# Creating two DataFrames having the same index
mydf1 = pd.DataFrame({'First': [10, 20, 30], 'Second': [40, 50, 60]}, index=['a', 'b', 'c'])
mydf2 = pd.DataFrame({'Third': [70, 80, 90], 'Four': [100, 110, 120]}, index=['a', 'b', 'c'])

# Merging the DataFrames on index using concat() function
mymerged_df = pd.concat([mydf1, mydf2], axis=1)

# Print the merged DataFrame
print(mymerged_df)

Output

   First  Second  Third  Four
a     10      40     70   100
b     20      50     80   110
c     30      60     90   120

Conclusion

We learned how to combine two Panda data frames based on the index in this article. When two datasets share a common index but have distinct features or data points, merging two Pandas DataFrames based on the index can be useful in a variety of data analysis scenarios. Using various functions like merge() and join(), the Pandas library makes it simple and effective to merge DataFrames. Based on their index, the columns of the two DataFrames are combined into the final data frame. DataFrames are a 2-layered marked information structure with lines and segments, where every section can have an alternate information type, and both the lines and segments are named.

Updated on: 31-Jul-2023

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements