- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to do Fuzzy Matching on Pandas Dataframe Column Using Python?
We will match words in the first DataFrame with words in the second DataFrame. For closest matches, we will use threshold. We took the value of threshold as 70 i.e., match occurs when the strings at more than 70% close to each other.
Let us first create Dictionaries and convert to pandas dataframe −
# dictionaries d1 = {'Car': ["BMW", "Audi", "Lexus", "Mercedes", "Rolls"]} d2 = {'Car': ["BM", "Audi", "Le", "MERCEDES", "Rolls Royce"]} # convert dictionaries to pandas dataframes df1 = pd.DataFrame(d1) df2 = pd.DataFrame(d2)
Now, convert dataframe column to list of elements for fuzzy matching −
myList1 = df1['Car'].tolist() myList2 = df2['Car'].tolist()
Example
Following is the complete code −
import pandas as pd from fuzzywuzzy import fuzz from fuzzywuzzy import process # dictionaries d1 = {'Car': ["BMW", "Audi", "Lexus", "Mercedes", "Rolls"]} d2 = {'Car': ["BM", "Audi", "Le", "MERCEDES", "Rolls Royce"]} # convert dictionaries to pandas dataframes df1 = pd.DataFrame(d1) df2 = pd.DataFrame(d2) # printing the pandas dataframes print("Dataframe 1 =
",df1) print("Dataframe 2 =
",df2) # empty lists for storing the matches later match1 = [] match2 = [] k = [] # converting dataframe column to list of elements for fuzzy matching myList1 = df1['Car'].tolist() myList2 = df2['Car'].tolist() threshold = 70 # iterating myList1 to extract closest match from myList2 for i in myList1: match1.append(process.extractOne(i, myList2, scorer=fuzz.ratio)) df1['matches'] = match1 for j in df1['matches']: if j[1] >= threshold: k.append(j[0]) match2.append(",".join(k)) k = [] # saving matches to df1 df1['matches'] = match2 print("
Matches...") print(df1)
Output
This will produce the following output −
Dataframe 1 = Car 0 BMW 1 Audi 2 Lexus 3 Mercedes 4 Rolls Dataframe 2 = Car 0 BM 1 Audi 2 Le 3 Mercedes 4 Rolls Royce Matches... Car matches 0 BM BM 1 Audi Audi 2 Lexus 3 Mercedes MERCEDES 4 Rolls
- Related Articles
- Python - How to select a column from a Pandas DataFrame
- Python - Add a zero column to Pandas DataFrame
- Deleting a DataFrame row in Python Pandas based on column value
- Python Pandas - Subset DataFrame by Column Name
- Creating a Pandas dataframe column based on a given condition in Python
- How to delete a column from Pandas DataFrame
- Apply uppercase to a column in Pandas dataframe in Python
- How to rename column names in a Pandas DataFrame?
- How to delete a column from a Pandas DataFrame?
- How to sort a column of a Pandas DataFrame?
- How to add column from another DataFrame in Pandas?
- How to shift a column in a Pandas DataFrame?
- How to set Dataframe Column value as X-axis labels in Python Pandas?
- Python - How to Count the NaN Occurrences in a Column in Pandas Dataframe?
- Python - How to rename multiple column headers in a Pandas DataFrame with Dictionary?

Advertisements