How to do Fuzzy Matching on Pandas Dataframe Column Using Python?



We will match words in the first DataFrame with words in the second DataFrame. For closest matches, we will use threshold. We took the value of threshold as 70 i.e., match occurs when the strings at more than 70% close to each other.

Let us first create Dictionaries and convert to pandas dataframe −

# dictionaries
d1 = {'Car': ["BMW", "Audi", "Lexus", "Mercedes", "Rolls"]}

d2 = {'Car': ["BM", "Audi", "Le", "MERCEDES", "Rolls Royce"]}

# convert dictionaries to pandas dataframes
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)

Now, convert dataframe column to list of elements for fuzzy matching −

myList1 = df1['Car'].tolist()
myList2 = df2['Car'].tolist()

Example

Following is the complete code −

import pandas as pd
from fuzzywuzzy import fuzz
from fuzzywuzzy import process

# dictionaries
d1 = {'Car': ["BMW", "Audi", "Lexus", "Mercedes", "Rolls"]}

d2 = {'Car': ["BM", "Audi", "Le", "MERCEDES", "Rolls Royce"]}

# convert dictionaries to pandas dataframes
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)

# printing the pandas dataframes
print("Dataframe 1 = 
",df1) print("Dataframe 2 =
",df2) # empty lists for storing the matches later match1 = [] match2 = [] k = [] # converting dataframe column to list of elements for fuzzy matching myList1 = df1['Car'].tolist() myList2 = df2['Car'].tolist() threshold = 70 # iterating myList1 to extract closest match from myList2 for i in myList1:    match1.append(process.extractOne(i, myList2, scorer=fuzz.ratio)) df1['matches'] = match1 for j in df1['matches']:    if j[1] >= threshold:       k.append(j[0])    match2.append(",".join(k))    k = [] # saving matches to df1 df1['matches'] = match2 print("
Matches...") print(df1)

Output

This will produce the following output −

Dataframe 1 =
       Car
0      BMW
1     Audi
2    Lexus
3 Mercedes
4    Rolls
Dataframe 2 =
           Car
0          BM
1        Audi
2          Le
3    Mercedes
4 Rolls Royce

Matches...
        Car matches
0       BM       BM
1     Audi     Audi
2    Lexus
3 Mercedes MERCEDES
4    Rolls

Advertisements