Find the k most frequent words from data set in Python

If there is a need to find 10 most frequent words in a data set, python can help us find it using the collections module. The collections module has a counter class which gives the count of the words after we supply a list of words to it. We also use the most_common method to find out the number of such words as needed by the program input.


In the below example we take a paragraph, and then first create a list of words applying split(). We will then apply the counter() to find the count of all the words. Finally the most_common function will give us the appropriate result of how many such words with highest frequency we want.

from collections import Counter
word_set = " This is a series of strings to count " \
   "many words . They sometime hurt and words sometime inspire "\
   "Also sometime fewer words convey more meaning than a bag of words "\
   "Be careful what you speak or what you write or even what you think of. "\
# Create list of all the words in the string
word_list = word_set.split()

# Get the count of each word.
word_count = Counter(word_list)

# Use most_common() method from Counter subclass


Running the above code gives us the following result −

[('words', 4), ('sometime', 3), ('what', 3)]