How to do Fuzzy Matching on Pandas Dataframe Column Using Python?

Computer NetworkInternetMCA

We will match words in the first DataFrame with words in the second DataFrame. For closest matches, we will use threshold. We took the value of threshold as 70 i.e., match occurs when the strings at more than 70% close to each other.

Let us first create Dictionaries and convert to pandas dataframe −

# dictionaries
d1 = {'Car': ["BMW", "Audi", "Lexus", "Mercedes", "Rolls"]}

d2 = {'Car': ["BM", "Audi", "Le", "MERCEDES", "Rolls Royce"]}

# convert dictionaries to pandas dataframes
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)

Now, convert dataframe column to list of elements for fuzzy matching −

myList1 = df1['Car'].tolist()
myList2 = df2['Car'].tolist()

Example

Following is the complete code −

import pandas as pd
from fuzzywuzzy import fuzz
from fuzzywuzzy import process

# dictionaries
d1 = {'Car': ["BMW", "Audi", "Lexus", "Mercedes", "Rolls"]}

d2 = {'Car': ["BM", "Audi", "Le", "MERCEDES", "Rolls Royce"]}

# convert dictionaries to pandas dataframes
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)

# printing the pandas dataframes
print("Dataframe 1 = \n",df1)
print("Dataframe 2 = \n",df2)

# empty lists for storing the matches later
match1 = []
match2 = []
k = []

# converting dataframe column to list of elements for fuzzy matching
myList1 = df1['Car'].tolist()
myList2 = df2['Car'].tolist()

threshold = 70

# iterating myList1 to extract closest match from myList2
for i in myList1:
   match1.append(process.extractOne(i, myList2, scorer=fuzz.ratio))
df1['matches'] = match1
for j in df1['matches']:
   if j[1] >= threshold:
      k.append(j[0])
   match2.append(",".join(k))
   k = []

# saving matches to df1 
df1['matches'] = match2 
print("\nMatches...")
print(df1)

Output

This will produce the following output −

Dataframe 1 =
       Car
0      BMW
1     Audi
2    Lexus
3 Mercedes
4    Rolls
Dataframe 2 =
           Car
0          BM
1        Audi
2          Le
3    Mercedes
4 Rolls Royce

Matches...
        Car matches
0       BM       BM
1     Audi     Audi
2    Lexus
3 Mercedes MERCEDES
4    Rolls
raja
Published on 09-Sep-2021 09:27:27
Advertisements