Find frequency of each word in a string in Python


As a part of text analytics, we frequently need to count words and assign weightage to them for processing in various algorithms, so in this article we will see how we can find the frequency of each word in a given sentence. We can do it with three approaches as shown below.

Using Counter

We can use the Counter() from collections module to get the frequency of the words. Here we first apply the split() to generate the words from the line and then apply the most_common ().

Example

 Live Demo

from collections import Counter
line_text = "Learn and practice and learn to practice"
freq = Counter(line_text.split()).most_common()
print(freq)

Running the above code gives us the following result −

[('and', 2), ('practice', 2), ('Learn', 1), ('learn', 1), ('to', 1)]

Using FreqDist()

The natural language tool kit provides the FreqDist function which shows the number of words in the string as well as the number of distinct words. Applying the most_common() gives us the frequency of each word.

Example

from nltk import FreqDist
text = "Learn and practice and learn to practice"
words = text.split()
fdist1 = FreqDist(words)
print(fdist1)
print(fdist1.most_common())

Running the above code gives us the following result −

<FreqDist with 5 samples and 7 outcomes>
[('and', 2), ('practice', 2), ('Learn', 1), ('learn', 1), ('to', 1)]

Using Dictionary

In this approach we store the words of the line in a dictionary. Then we apply the count() to get the frequency of each word. Then zip the words with the word frequency values. The final result is shown as a dictionary.

Example

 Live Demo

text = "Learn and practice and learn to practice"
words = []
words = text.split()
wfreq=[words.count(w) for w in words]
print(dict(zip(words,wfreq)))

Running the above code gives us the following result:

{'Learn': 1, 'and': 2, 'practice': 2, 'learn': 1, 'to': 1}

Updated on: 20-Dec-2019

9K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements