Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
Merge Python Pandas dataframe with a common column and set NaN for unmatched values
To merge two Pandas DataFrames with a common column, use the merge() function and set the on parameter as the column name. To set NaN for unmatched values, use the how parameter and set it to left or right for left or right joins respectively.
Understanding Merge Types
The how parameter determines which rows to include ?
- left ? Keep all rows from the left DataFrame, set NaN for unmatched right values
- right ? Keep all rows from the right DataFrame, set NaN for unmatched left values
- inner ? Keep only matching rows from both DataFrames
- outer ? Keep all rows from both DataFrames, set NaN where no match
Example: Left Merge with NaN for Unmatched Values
Let's create two DataFrames and perform a left merge ?
import pandas as pd
# Create DataFrame1
dataFrame1 = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Audi', 'Mustang', 'Bentley', 'Jaguar'],
"Units": [100, 150, 110, 80, 110, 90]
})
print("DataFrame1:")
print(dataFrame1)
# Create DataFrame2
dataFrame2 = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Tesla', 'Mustang', 'Mercedes', 'Jaguar'],
"Reg_Price": [7000, 1500, 5000, 8000, 9000, 6000]
})
print("\nDataFrame2:")
print(dataFrame2)
# Merge DataFrames with left join - keeps all rows from dataFrame1
mergedRes = pd.merge(dataFrame1, dataFrame2, on='Car', how="left")
print("\nMerged DataFrame (left join):")
print(mergedRes)
DataFrame1:
Car Units
0 BMW 100
1 Lexus 150
2 Audi 110
3 Mustang 80
4 Bentley 110
5 Jaguar 90
DataFrame2:
Car Reg_Price
0 BMW 7000
1 Lexus 1500
2 Tesla 5000
3 Mustang 8000
4 Mercedes 9000
5 Jaguar 6000
Merged DataFrame (left join):
Car Units Reg_Price
0 BMW 100 7000.0
1 Lexus 150 1500.0
2 Audi 110 NaN
3 Mustang 80 8000.0
4 Bentley 110 NaN
5 Jaguar 90 6000.0
Example: Right Merge
A right merge keeps all rows from the right DataFrame ?
import pandas as pd
dataFrame1 = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Audi', 'Mustang', 'Bentley', 'Jaguar'],
"Units": [100, 150, 110, 80, 110, 90]
})
dataFrame2 = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Tesla', 'Mustang', 'Mercedes', 'Jaguar'],
"Reg_Price": [7000, 1500, 5000, 8000, 9000, 6000]
})
# Right merge - keeps all rows from dataFrame2
mergedRes = pd.merge(dataFrame1, dataFrame2, on='Car', how="right")
print("Merged DataFrame (right join):")
print(mergedRes)
Merged DataFrame (right join):
Car Units Reg_Price
0 BMW 100.0 7000
1 Lexus 150.0 1500
2 Mustang 80.0 8000
3 Jaguar 90.0 6000
4 Tesla NaN 5000
5 Mercedes NaN 9000
Comparison of Merge Types
| Merge Type | Keeps Rows From | NaN Location |
|---|---|---|
left |
Left DataFrame | Right columns for unmatched left rows |
right |
Right DataFrame | Left columns for unmatched right rows |
inner |
Both (matching only) | None (no NaN values) |
outer |
Both DataFrames | Any column for unmatched rows |
Conclusion
Use how="left" to keep all rows from the first DataFrame and set NaN for unmatched values from the second. Use how="right" for the opposite behavior. The merge operation provides flexible control over how to handle missing data.
Advertisements
