Word Frequency - Problem

Shell Medium

Write a bash script to calculate the frequency of each word in a text file words.txt.

For simplicity, you may assume:

words.txt contains only lowercase characters and space ' ' characters
Each word must consist of lowercase characters only
Words are separated by one or more whitespace characters

The output should be sorted by frequency in descending order, with the most frequent words first. If two words have the same frequency, sort them alphabetically.

Input & Output

Example 1 — Basic Word Frequency

$ Input: content = "the quick brown fox jumps over the lazy dog"

› Output: the 2\nbrown 1\ndog 1\nfox 1\njumps 1\nlazy 1\nover 1\nquick 1

💡 Note: The word 'the' appears 2 times (most frequent), all other words appear once each. Words with same frequency are sorted alphabetically.

Example 2 — Single Word

$ Input: content = "hello"

› Output: hello 1

💡 Note: Only one word 'hello' appears once.

Example 3 — Multiple Spaces

$ Input: content = "a b a b a"

› Output: a 3\nb 2

💡 Note: Word 'a' appears 3 times, 'b' appears 2 times. Multiple spaces are handled correctly.

Constraints

1 ≤ content.length ≤ 10⁴
content contains only lowercase English letters and spaces
Words are separated by one or more spaces

Visualization

Tap to expand

Asked in

G Google 35 a Amazon 28 M Microsoft 22

The key insight is to use a hash map to count word frequencies in O(n) time, then sort results by frequency descending and alphabetically for ties. Best approach uses frequency counting with hash map. Time: O(n log n), Space: O(n)

Common Approaches

✓ Bash Commands Pipeline

⏱️ Time: O(n log n) Space: O(n)

Split words using tr to replace spaces with newlines, sort all words, then use uniq -c to count consecutive identical words. Finally sort by frequency in descending order.

Hash Map Frequency Counter

⏱️ Time: O(n log n) Space: O(n)

Split the input text into words, then use a hash map to count the frequency of each word in one pass. Sort the results by frequency descending, then alphabetically.

Bash Commands Pipeline — Algorithm Steps

Step 1: Replace spaces with newlines using tr to get one word per line
Step 2: Sort all words alphabetically so identical words are consecutive
Step 3: Use uniq -c to count consecutive identical words
Step 4: Sort by frequency (first column) in descending order

Visualization

Tap to expand

Step-by-Step Walkthrough

Split Words

tr ' ' '\n' converts spaces to newlines

Sort & Count

sort | uniq -c counts consecutive identical words

Sort by Frequency

sort -nr sorts by count in descending order

Code -

solution.c — C

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

#define MAX_WORDS 1000
#define MAX_WORD_LEN 100

typedef struct {
    char word[MAX_WORD_LEN];
    int count;
} WordCount;

int compare(const void* a, const void* b) {
    WordCount* wa = (WordCount*)a;
    WordCount* wb = (WordCount*)b;
    
    if (wa->count != wb->count) {
        return wb->count - wa->count; // Sort by frequency desc
    }
    return strcmp(wa->word, wb->word); // Then alphabetically
}

int solution(char* content, WordCount* result) {
    if (strlen(content) == 0) {
        return 0;
    }
    
    WordCount words[MAX_WORDS];
    int wordCount = 0;
    
    char* token = strtok(content, " \t\n");
    while (token != NULL && wordCount < MAX_WORDS) {
        // Find if word already exists
        int found = -1;
        for (int i = 0; i < wordCount; i++) {
            if (strcmp(words[i].word, token) == 0) {
                found = i;
                break;
            }
        }
        
        if (found >= 0) {
            words[found].count++;
        } else {
            strcpy(words[wordCount].word, token);
            words[wordCount].count = 1;
            wordCount++;
        }
        
        token = strtok(NULL, " \t\n");
    }
    
    // Sort words
    qsort(words, wordCount, sizeof(WordCount), compare);
    
    // Copy to result
    for (int i = 0; i < wordCount; i++) {
        result[i] = words[i];
    }
    
    return wordCount;
}

void parseArray(const char* str, int* arr, int* size) {
    *size = 0;
    const char* p = str;
    while (*p && *p != '[') p++;
    if (*p == '[') p++;
    while (*p && *p != ']') {
        while (*p == ' ' || *p == ',') p++;
        if (*p == ']' || *p == '\0') break;
        arr[(*size)++] = (int)strtol(p, (char**)&p, 10);
    }
}

int main() {
    char content[10000];
    fgets(content, sizeof(content), stdin);
    
    // Remove newline
    content[strcspn(content, "\n")] = 0;
    
    // Remove quotes if present
    if (content[0] == '"' && content[strlen(content)-1] == '"') {
        content[strlen(content)-1] = 0;
        memmove(content, content+1, strlen(content));
    }
    
    WordCount result[MAX_WORDS];
    int count = solution(content, result);
    
    for (int i = 0; i < count; i++) {
        printf("%s %d\n", result[i].word, result[i].count);
    }
    
    return 0;
}

Time & Space Complexity

Time Complexity

⏱️

O(n log n)

Sorting words dominates the time complexity

⚡ Linearithmic

Space Complexity

O(n)

Space needed to store all words during sorting

⚡ Linearithmic Space

23.5K Views

Medium Frequency

~15 min Avg. Time

890 Likes

Ln 1, Col 1

Smart Actions

💡 Explanation

AI Ready

💡 Suggestion Tab to accept Esc to dismiss

// Output will appear here after running code

Code Editor Closed

Click the red button to reopen

Word Frequency - Problem

Input & Output

Constraints

Visualization

Related Problems

Common Approaches

Bash Commands Pipeline — Algorithm Steps

Visualization

Code -

Time & Space Complexity

Select Compiler