Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python Pandas – Fetch the Common rows between two DataFrames with concat()
To fetch the common rows between two DataFrames, use the concat() function. This method involves concatenating both DataFrames, grouping by all columns, and identifying rows that appear in both DataFrames.
Understanding the Approach
The process involves several steps ?
- Concatenate both DataFrames using
concat() - Reset the index to avoid conflicts
- Group by all columns to identify duplicates
- Filter groups with more than one occurrence
Step-by-Step Implementation
Creating Sample DataFrames
Let's create two DataFrames with car data ?
import pandas as pd
# Create DataFrame1
dataFrame1 = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Audi', 'Tesla', 'Bentley', 'Jaguar'],
"Reg_Price": [1000, 1500, 1100, 800, 1100, 900]
})
print("DataFrame1:")
print(dataFrame1)
DataFrame1:
Car Reg_Price
0 BMW 1000
1 Lexus 1500
2 Audi 1100
3 Tesla 800
4 Bentley 1100
5 Jaguar 900
import pandas as pd
# Create DataFrame2
dataFrame2 = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Audi', 'Tesla', 'Bentley', 'Jaguar'],
"Reg_Price": [1200, 1500, 1000, 800, 1100, 1000]
})
print("DataFrame2:")
print(dataFrame2)
DataFrame2:
Car Reg_Price
0 BMW 1200
1 Lexus 1500
2 Audi 1000
3 Tesla 800
4 Bentley 1100
5 Jaguar 1000
Finding Common Rows
Now we'll concatenate and identify common rows ?
import pandas as pd
# Create DataFrames
dataFrame1 = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Audi', 'Tesla', 'Bentley', 'Jaguar'],
"Reg_Price": [1000, 1500, 1100, 800, 1100, 900]
})
dataFrame2 = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Audi', 'Tesla', 'Bentley', 'Jaguar'],
"Reg_Price": [1200, 1500, 1000, 800, 1100, 1000]
})
# Concatenate both DataFrames
dfRes = pd.concat([dataFrame1, dataFrame2])
# Reset index to avoid conflicts
dfRes = dfRes.reset_index(drop=True)
# Group by all columns to find duplicates
dfGroup = dfRes.groupby(list(dfRes.columns))
# Get indices of rows that appear more than once (common rows)
res = [k[0] for k in dfGroup.groups.values() if len(k) > 1]
# Display common rows
print("Common rows between DataFrames:")
print(dfRes.reindex(res))
Common rows between DataFrames:
Car Reg_Price
1 Lexus 1500
3 Tesla 800
4 Bentley 1100
How It Works
The algorithm identifies rows with identical values across all columns ?
| Step | Purpose | Result |
|---|---|---|
concat() |
Combine both DataFrames | Single DataFrame with all rows |
reset_index() |
Avoid index conflicts | Sequential integer index |
groupby() |
Group identical rows | Groups of duplicate rows |
| Filter by length | Find groups with >1 member | Common rows only |
Alternative Approach Using merge()
You can also use merge() for a more direct approach ?
import pandas as pd
dataFrame1 = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Audi', 'Tesla', 'Bentley', 'Jaguar'],
"Reg_Price": [1000, 1500, 1100, 800, 1100, 900]
})
dataFrame2 = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Audi', 'Tesla', 'Bentley', 'Jaguar'],
"Reg_Price": [1200, 1500, 1000, 800, 1100, 1000]
})
# Find common rows using merge (inner join)
common_rows = pd.merge(dataFrame1, dataFrame2, how='inner')
print("Common rows using merge():")
print(common_rows)
Common rows using merge():
Car Reg_Price
0 Lexus 1500
1 Tesla 800
2 Bentley 1100
Conclusion
Use concat() with groupby to find common rows when you need more control over the process. The merge() approach is simpler and more efficient for basic common row identification. Both methods identify rows that exist in both DataFrames with identical values.
