Python program to count distinct words and count frequency of them

When working with text data, we often need to count how many distinct words appear and track their frequencies. Python provides several approaches to solve this problem efficiently.

In this example, we have a list of words where some words may appear multiple times. We need to return the count of distinct words and their frequencies in order of first appearance.

Problem Example

Given the input: words = ["Book", "Sound", "Language", "Computer", "Book", "Language"]

The expected output is (4, '2 1 2 1') because:

  • There are 4 distinct words
  • "Book" appears 2 times
  • "Sound" appears 1 time
  • "Language" appears 2 times
  • "Computer" appears 1 time

Using OrderedDict

OrderedDict maintains insertion order, which is perfect for tracking word frequencies in the order they first appear ?

from collections import OrderedDict

def solve(words):
    d = OrderedDict()
    for w in words:
        if w in d:
            d[w] += 1
        else:
            d[w] = 1
    return len(d.keys()), ' '.join([str(d[k]) for k in d.keys()])

words = ["Book", "Sound", "Language", "Computer", "Book", "Language"]
distinct_count, frequencies = solve(words)
print(f"Distinct words: {distinct_count}")
print(f"Frequencies: {frequencies}")
print(f"Result: {solve(words)}")
Distinct words: 4
Frequencies: 2 1 2 1
Result: (4, '2 1 2 1')

Using Counter (Alternative Approach)

Counter provides a more concise way to count frequencies, though it doesn't guarantee order in older Python versions ?

from collections import Counter

def solve_with_counter(words):
    counter = Counter(words)
    # Maintain original order by iterating through unique words in order of appearance
    seen = []
    for word in words:
        if word not in seen:
            seen.append(word)
    
    frequencies = [str(counter[word]) for word in seen]
    return len(counter), ' '.join(frequencies)

words = ["Book", "Sound", "Language", "Computer", "Book", "Language"]
result = solve_with_counter(words)
print(result)
(4, '2 1 2 1')

Using Dictionary (Python 3.7+)

Since Python 3.7, regular dictionaries maintain insertion order, making them suitable for this task ?

def solve_with_dict(words):
    word_count = {}
    for word in words:
        word_count[word] = word_count.get(word, 0) + 1
    
    distinct_count = len(word_count)
    frequencies = ' '.join(str(count) for count in word_count.values())
    
    return distinct_count, frequencies

words = ["Book", "Sound", "Language", "Computer", "Book", "Language"]
result = solve_with_dict(words)
print(result)
(4, '2 1 2 1')

Comparison

Method Order Preserved Python Version Best For
OrderedDict Yes All versions Guaranteed order preservation
Counter Requires extra work All versions Complex counting operations
Regular Dict Yes Python 3.7+ Simple and clean syntax

Conclusion

Use OrderedDict for guaranteed insertion order across all Python versions. For Python 3.7+, regular dictionaries provide the same functionality with cleaner syntax. Counter is useful when you need additional counting operations beyond basic frequency tracking.

Updated on: 2026-03-26T15:39:22+05:30

937 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements