Write a program in Python to merge two dataframes based on matching data in a column

In data analysis, you often need to combine data from multiple DataFrames based on common columns. Pandas provides the merge() function to join DataFrames similar to SQL joins.

Understanding DataFrame Merging

When you have two DataFrames with a common column, you can merge them to create a single DataFrame containing matching records. The merge operation finds rows where the specified column values match between both DataFrames.

Creating Sample DataFrames

Let's start by creating two DataFrames with a common 'id' column ?

import pandas as pd

# First DataFrame
first_df = pd.DataFrame({
    'id': [1, 2, 3, 4], 
    'country': ['India', 'UK', 'US', 'China']
})
print("First DataFrame:")
print(first_df)

# Second DataFrame
second_df = pd.DataFrame({
    'id': [1, 11, 22, 4], 
    'City': ['Chennai', 'Cambridge', 'Chicago', 'Chengdu']
})
print("\nSecond DataFrame:")
print(second_df)
First DataFrame:
   id country
0   1   India
1   2      UK
2   3      US
3   4   China

Second DataFrame:
   id      City
0   1   Chennai
1  11  Cambridge
2  22    Chicago
3   4   Chengdu

Merging DataFrames on Common Column

Use pd.merge() with the on parameter to specify the common column ?

import pandas as pd

# Create DataFrames
first_df = pd.DataFrame({
    'id': [1, 2, 3, 4], 
    'country': ['India', 'UK', 'US', 'China']
})

second_df = pd.DataFrame({
    'id': [1, 11, 22, 4], 
    'City': ['Chennai', 'Cambridge', 'Chicago', 'Chengdu']
})

# Merge based on 'id' column
merged_df = pd.merge(first_df, second_df, how='inner', on='id')
print("Merged DataFrame:")
print(merged_df)
Merged DataFrame:
   id country     City
0   1   India  Chennai
1   4   China  Chengdu

Types of Joins

The how parameter determines the join type ?

import pandas as pd

first_df = pd.DataFrame({
    'id': [1, 2, 3, 4], 
    'country': ['India', 'UK', 'US', 'China']
})

second_df = pd.DataFrame({
    'id': [1, 11, 22, 4], 
    'City': ['Chennai', 'Cambridge', 'Chicago', 'Chengdu']
})

# Inner join (default) - only matching records
inner_join = pd.merge(first_df, second_df, how='inner', on='id')
print("Inner Join:")
print(inner_join)

# Left join - all records from first DataFrame
left_join = pd.merge(first_df, second_df, how='left', on='id')
print("\nLeft Join:")
print(left_join)
Inner Join:
   id country     City
0   1   India  Chennai
1   4   China  Chengdu

Left Join:
   id country     City
0   1   India  Chennai
1   2      UK      NaN
2   3      US      NaN
3   4   China  Chengdu

Join Types Comparison

Join Type Description Result
inner Only matching records Records with common IDs only
left All from left DataFrame All first_df records + matches
right All from right DataFrame All second_df records + matches
outer All records from both All records with NaN for missing

Conclusion

Use pd.merge() with the on parameter to join DataFrames on common columns. Choose the appropriate join type based on whether you want only matching records (inner) or all records from one or both DataFrames.

Updated on: 2026-03-25T16:17:14+05:30

289 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements