Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python program to count distinct words and count frequency of them
When working with text data, we often need to count how many distinct words appear and track their frequencies. Python provides several approaches to solve this problem efficiently.
In this example, we have a list of words where some words may appear multiple times. We need to return the count of distinct words and their frequencies in order of first appearance.
Problem Example
Given the input: words = ["Book", "Sound", "Language", "Computer", "Book", "Language"]
The expected output is (4, '2 1 2 1') because:
- There are 4 distinct words
- "Book" appears 2 times
- "Sound" appears 1 time
- "Language" appears 2 times
- "Computer" appears 1 time
Using OrderedDict
OrderedDict maintains insertion order, which is perfect for tracking word frequencies in the order they first appear ?
from collections import OrderedDict
def solve(words):
d = OrderedDict()
for w in words:
if w in d:
d[w] += 1
else:
d[w] = 1
return len(d.keys()), ' '.join([str(d[k]) for k in d.keys()])
words = ["Book", "Sound", "Language", "Computer", "Book", "Language"]
distinct_count, frequencies = solve(words)
print(f"Distinct words: {distinct_count}")
print(f"Frequencies: {frequencies}")
print(f"Result: {solve(words)}")
Distinct words: 4 Frequencies: 2 1 2 1 Result: (4, '2 1 2 1')
Using Counter (Alternative Approach)
Counter provides a more concise way to count frequencies, though it doesn't guarantee order in older Python versions ?
from collections import Counter
def solve_with_counter(words):
counter = Counter(words)
# Maintain original order by iterating through unique words in order of appearance
seen = []
for word in words:
if word not in seen:
seen.append(word)
frequencies = [str(counter[word]) for word in seen]
return len(counter), ' '.join(frequencies)
words = ["Book", "Sound", "Language", "Computer", "Book", "Language"]
result = solve_with_counter(words)
print(result)
(4, '2 1 2 1')
Using Dictionary (Python 3.7+)
Since Python 3.7, regular dictionaries maintain insertion order, making them suitable for this task ?
def solve_with_dict(words):
word_count = {}
for word in words:
word_count[word] = word_count.get(word, 0) + 1
distinct_count = len(word_count)
frequencies = ' '.join(str(count) for count in word_count.values())
return distinct_count, frequencies
words = ["Book", "Sound", "Language", "Computer", "Book", "Language"]
result = solve_with_dict(words)
print(result)
(4, '2 1 2 1')
Comparison
| Method | Order Preserved | Python Version | Best For |
|---|---|---|---|
| OrderedDict | Yes | All versions | Guaranteed order preservation |
| Counter | Requires extra work | All versions | Complex counting operations |
| Regular Dict | Yes | Python 3.7+ | Simple and clean syntax |
Conclusion
Use OrderedDict for guaranteed insertion order across all Python versions. For Python 3.7+, regular dictionaries provide the same functionality with cleaner syntax. Counter is useful when you need additional counting operations beyond basic frequency tracking.
