Word Frequency - Problem

Write a bash script to calculate the frequency of each word in a text file words.txt.

For simplicity, you may assume:

  • words.txt contains only lowercase characters and space ' ' characters
  • Each word must consist of lowercase characters only
  • Words are separated by one or more whitespace characters

The output should be sorted by frequency in descending order, with the most frequent words first. If two words have the same frequency, sort them alphabetically.

Input & Output

Example 1 — Basic Word Frequency
$ Input: content = "the quick brown fox jumps over the lazy dog"
Output: the 2\nbrown 1\ndog 1\nfox 1\njumps 1\nlazy 1\nover 1\nquick 1
💡 Note: The word 'the' appears 2 times (most frequent), all other words appear once each. Words with same frequency are sorted alphabetically.
Example 2 — Single Word
$ Input: content = "hello"
Output: hello 1
💡 Note: Only one word 'hello' appears once.
Example 3 — Multiple Spaces
$ Input: content = "a b a b a"
Output: a 3\nb 2
💡 Note: Word 'a' appears 3 times, 'b' appears 2 times. Multiple spaces are handled correctly.

Constraints

  • 1 ≤ content.length ≤ 104
  • content contains only lowercase English letters and spaces
  • Words are separated by one or more spaces

Visualization

Tap to expand
Word Frequency Counter INPUT words.txt content: the quick brown fox ... jumps over the lazy dog Hash Table Structure Key Count Hash "the" 2 0x3F2 "quick" 1 0x1A7 "brown" 1 0x2C4 "fox" 1 0x0B1 "jumps" 1 0x4E9 "over" 1 0x5D3 "lazy" 1 0x6A2 "dog" 1 0x7F1 ALGORITHM STEPS 1 Read & Split Words cat words.txt | tr -s ' ' | tr ' ' '\n' 2 Sort Words | sort (alphabetically) Groups identical words 3 Count Occurrences | uniq -c (hash count) Prefix count to each word 4 Sort by Frequency | sort -rn -k1 -k2 Descending freq, then alpha #!/bin/bash cat words.txt \ | tr -s ' ' '\n' \ | sort | uniq -c \ | sort -rn -k1 -k2 | awk '{print $2,$1}' FINAL RESULT Sorted Word Frequencies: Word Freq Visual the 2 brown 1 dog 1 fox 1 jumps 1 lazy 1 over 1 quick 1 OK - Sorted correctly! Total: 9 words, 8 unique Time: O(n log n) Key Insight: Hash-Based Word Counting The uniq -c command internally uses a hash-like approach: it counts consecutive identical lines. By sorting first, we group all identical words together, allowing O(1) counting per unique word. The two-level sort (by frequency desc, then alphabetically) ensures consistent, deterministic output. TutorialsPoint - Word Frequency | Hash Approach
Asked in
Google 35 Amazon 28 Microsoft 22
23.5K Views
Medium Frequency
~15 min Avg. Time
890 Likes
Ln 1, Col 1
Smart Actions
💡 Explanation
AI Ready
💡 Suggestion Tab to accept Esc to dismiss
// Output will appear here after running code
Code Editor Closed
Click the red button to reopen