How to find similar words in vector of strings in R?

R ProgrammingServer Side ProgrammingProgramming

Sometimes strings in a vector of strings have spelling errors and we want to extract the similar words to avoid that spelling error because similar words are likely to represent the correct and incorrect form of a word. This can be done by using agrep with lapply function.

Example 1

 Live Demo

x1<-c("India","United Kingdoms","Indiaa","Egyypt","United
Kingdom","Turkey","Egypt","Belaarus","Belarus")
lapply(x1,agrep,x1,value=TRUE)

Output

[[1]]
[1] "India" "Indiaa"
[[2]]
[1] "United Kingdoms" "United Kingdom"
[[3]]
[1] "India" "Indiaa"
[[4]]
[1] "Egyypt" "Egypt"
[[5]]
[1] "United Kingdoms" "United Kingdom"
[[6]]
[1] "Turkey"
[[7]]
[1] "Egyypt" "Egypt"
[[8]]
[1] "Belaarus" "Belarus"
[[9]]
[1] "Belaarus" "Belarus"

Example 2

 Live Demo

x2<-c("Alhadi","Umair","Omar","Alhadi","Shanti","Shant","Umaer","Peter","Rahul","Pattrick","P
eeter","Rahuls")
lapply(x2,agrep,x2,value=TRUE)

Output

[[1]]
[1] "Al-hadi" "Alhadi"
[[2]]
[1] "Umair" "Umaer"
[[3]]
[1] "Omar"
[[4]]
[1] "Al-hadi" "Alhadi"
[[5]]
[1] "Shanti" "Shant"
[[6]]
[1] "Shanti" "Shant"
[[7]]
[1] "Umair" "Umaer"
[[8]]
[1] "Peter" "Peeter"
[[9]]
[1] "Rahul" "Rahuls"
[[10]]
[1] "Pattrick"
[[11]]
[1] "Peter" "Peeter"
[[12]]
[1] "Rahul" "Rahuls"

Example 3

 Live Demo

x3<-c("Alabamaa","New Yorky","New
Yok","Alabma","Florida","Illinois","Texas","Illinoise")
lapply(x3,agrep,x3,value=TRUE)

Output

[[1]]
[1] "Alabamaa"
[[2]]
[1] "New Yorky"
[[3]]
[1] "New Yorky" "New Yok"
[[4]]
[1] "Alabamaa" "Alabma"
[[5]]
[1] "Florida"
[[6]]
[1] "Illinois" "Illinoise"
[[7]]
[1] "Texas"
[[8]]
[1] "Illinois" "Illinoise"
raja
Published on 09-Sep-2020 07:43:16
Advertisements