Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
Selected Reading
How to do Fuzzy Matching on Pandas Dataframe Column Using Python?
We will match words in the first DataFrame with words in the second DataFrame. For closest matches, we will use threshold. We took the value of threshold as 70 i.e., match occurs when the strings at more than 70% close to each other.
Let us first create Dictionaries and convert to pandas dataframe −
# dictionaries
d1 = {'Car': ["BMW", "Audi", "Lexus", "Mercedes", "Rolls"]}
d2 = {'Car': ["BM", "Audi", "Le", "MERCEDES", "Rolls Royce"]}
# convert dictionaries to pandas dataframes
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)
Now, convert dataframe column to list of elements for fuzzy matching −
myList1 = df1['Car'].tolist() myList2 = df2['Car'].tolist()
Example
Following is the complete code −
import pandas as pd
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
# dictionaries
d1 = {'Car': ["BMW", "Audi", "Lexus", "Mercedes", "Rolls"]}
d2 = {'Car': ["BM", "Audi", "Le", "MERCEDES", "Rolls Royce"]}
# convert dictionaries to pandas dataframes
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)
# printing the pandas dataframes
print("Dataframe 1 =
",df1)
print("Dataframe 2 =
",df2)
# empty lists for storing the matches later
match1 = []
match2 = []
k = []
# converting dataframe column to list of elements for fuzzy matching
myList1 = df1['Car'].tolist()
myList2 = df2['Car'].tolist()
threshold = 70
# iterating myList1 to extract closest match from myList2
for i in myList1:
match1.append(process.extractOne(i, myList2, scorer=fuzz.ratio))
df1['matches'] = match1
for j in df1['matches']:
if j[1] >= threshold:
k.append(j[0])
match2.append(",".join(k))
k = []
# saving matches to df1
df1['matches'] = match2
print("\nMatches...")
print(df1)
Output
This will produce the following output −
Dataframe 1 = Car 0 BMW 1 Audi 2 Lexus 3 Mercedes 4 Rolls Dataframe 2 = Car 0 BM 1 Audi 2 Le 3 Mercedes 4 Rolls Royce Matches... Car matches 0 BM BM 1 Audi Audi 2 Lexus 3 Mercedes MERCEDES 4 Rolls
Advertisements
