Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
Python - Merge DataFrames of different length
To merge DataFrames of different lengths, we use the merge() method with different join types. The left join keeps all rows from the left DataFrame and matches rows from the right DataFrame where possible.
Creating DataFrames of Different Lengths
Let's create two DataFrames with different lengths to demonstrate merging ?
import pandas as pd
# Create DataFrame1 with length 4
dataFrame1 = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Audi', 'Jaguar'],
"Price": [50000, 45000, 48000, 60000]
})
print("DataFrame1 ...\n", dataFrame1)
print("DataFrame1 length =", len(dataFrame1))
DataFrame1 ...
Car Price
0 BMW 50000
1 Lexus 45000
2 Audi 48000
3 Jaguar 60000
DataFrame1 length = 4
import pandas as pd
# Create DataFrame2 with length 6
dataFrame2 = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Audi', 'Mercedes', 'Jaguar', 'Bentley'],
"Year": [2020, 2019, 2021, 2022, 2020, 2023]
})
print("DataFrame2 ...\n", dataFrame2)
print("DataFrame2 length =", len(dataFrame2))
DataFrame2 ...
Car Year
0 BMW 2020
1 Lexus 2019
2 Audi 2021
3 Mercedes 2022
4 Jaguar 2020
5 Bentley 2023
DataFrame2 length = 6
Merging DataFrames with Left Join
A left join keeps all rows from the left DataFrame and adds matching data from the right DataFrame ?
import pandas as pd
# Create DataFrames
dataFrame1 = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Audi', 'Jaguar'],
"Price": [50000, 45000, 48000, 60000]
})
dataFrame2 = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Audi', 'Mercedes', 'Jaguar', 'Bentley'],
"Year": [2020, 2019, 2021, 2022, 2020, 2023]
})
# Merge DataFrames using left join
mergedRes = dataFrame2.merge(dataFrame1, on='Car', how='left')
print("Merged DataFrame (Left Join)...\n", mergedRes)
Merged DataFrame (Left Join)...
Car Year Price
0 BMW 2020 50000.0
1 Lexus 2019 45000.0
2 Audi 2021 48000.0
3 Mercedes 2022 NaN
4 Jaguar 2020 60000.0
5 Bentley 2023 NaN
Different Merge Types
You can use different join types depending on your needs ?
import pandas as pd
dataFrame1 = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Audi', 'Jaguar'],
"Price": [50000, 45000, 48000, 60000]
})
dataFrame2 = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Audi', 'Mercedes', 'Jaguar', 'Bentley'],
"Year": [2020, 2019, 2021, 2022, 2020, 2023]
})
# Inner join - only matching rows
inner_merge = dataFrame1.merge(dataFrame2, on='Car', how='inner')
print("Inner Join:\n", inner_merge)
# Outer join - all rows from both DataFrames
outer_merge = dataFrame1.merge(dataFrame2, on='Car', how='outer')
print("\nOuter Join:\n", outer_merge)
Inner Join:
Car Price Year
0 BMW 50000 2020
1 Lexus 45000 2019
2 Audi 48000 2021
3 Jaguar 60000 2020
Outer Join:
Car Price Year
0 BMW 50000.0 2020
1 Lexus 45000.0 2019
2 Audi 48000.0 2021
3 Jaguar 60000.0 2020
4 Mercedes NaN 2022
5 Bentley NaN 2023
Comparison of Merge Types
| Merge Type | Result | Use Case |
|---|---|---|
left |
All rows from left DataFrame | Keep all records from primary dataset |
right |
All rows from right DataFrame | Keep all records from secondary dataset |
inner |
Only matching rows | Only records present in both datasets |
outer |
All rows from both DataFrames | Complete union of both datasets |
Conclusion
Use merge() with the appropriate how parameter to combine DataFrames of different lengths. Left join preserves all rows from the left DataFrame, while inner join only keeps matching rows.
Advertisements
