Short Encoding of Words - Problem

A valid encoding of an array of words is any reference string s and array of indices indices such that:

words.length == indices.length
The reference string s ends with the '#' character
For each index indices[i], the substring of s starting from indices[i] and up to (but not including) the next '#' character is equal to words[i]

Given an array of words, return the length of the shortest reference string s possible of any valid encoding of words.

Input & Output

Example 1 — Basic Suffix Removal

$ Input: words = ["time", "me", "bell"]

› Output: 10

💡 Note: The word "me" is a suffix of "time", so we can encode it as part of "time". The encoding becomes "time#bell#" with length 10.

Example 2 — No Suffixes

$ Input: words = ["t"]

› Output: 2

💡 Note: Only one word, so the encoding is "t#" with length 2.

Example 3 — Multiple Suffix Relationships

$ Input: words = ["time", "me", "e"]

› Output: 5

💡 Note: Both "me" and "e" are suffixes of "time", so we only need to encode "time" as "time#" with length 5.

Constraints

1 ≤ words.length ≤ 2000
1 ≤ words[i].length ≤ 7
words[i] consists of only lowercase English letters

Visualization

Tap to expand

Asked in

G Google 15 M Microsoft 8

The key insight is that if word A is a suffix of word B, we don't need to encode A separately since it can be reconstructed from B's encoding. The optimal approach uses a trie built from reversed words to efficiently identify suffix relationships. Time: O(n×m), Space: O(n×m).

Common Approaches

✓ Brute Force - Check All Suffixes

⏱️ Time: O(n²×m) Space: O(1)

Generate the encoding string by including only words that are not suffixes of other words. For each word, check against all other words to see if it appears as a suffix.

Hash Set Optimization

⏱️ Time: O(n×m²) Space: O(n×m)

Store all words in a hash set for O(1) lookup. For each word, generate all possible suffixes and check if they exist in the set. Remove words that are suffixes.

Trie-Based Solution

⏱️ Time: O(n×m) Space: O(n×m)

Build a trie from reversed words. Words that don't have other words as their prefixes in the reverse trie are the ones we need to keep in the encoding.

Brute Force - Check All Suffixes — Algorithm Steps

Step 1: For each word, check if it's a suffix of any other word
Step 2: Include only non-suffix words in the encoding
Step 3: Calculate total length with '#' separators

Visualization

Tap to expand

Step-by-Step Walkthrough

Input Words

Start with array of words

Suffix Check

For each word, check if it's a suffix of any other word

Calculate Length

Sum lengths of remaining words plus '#' characters

Code -

solution.c — C

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// Check if word1 is a suffix of word2
int isSuffix(char* word1, char* word2) {
    int len1 = strlen(word1);
    int len2 = strlen(word2);
    
    if (len1 > len2) {
        return 0;
    }
    
    // Compare from the end
    for (int i = 0; i < len1; i++) {
        if (word1[len1 - 1 - i] != word2[len2 - 1 - i]) {
            return 0;
        }
    }
    
    return 1;
}

int solution(char** words, int wordsSize) {
    // Track which words are suffixes of other words
    int* isSuffixOfAnother = (int*)calloc(wordsSize, sizeof(int));
    
    // Check each word against all other words
    for (int i = 0; i < wordsSize; i++) {
        for (int j = 0; j < wordsSize; j++) {
            if (i != j) {
                // If words[i] is a suffix of words[j] and words[j] is longer
                if (strlen(words[i]) < strlen(words[j]) && isSuffix(words[i], words[j])) {
                    isSuffixOfAnother[i] = 1;
                    break;
                }
            }
        }
    }
    
    // Use a set to track unique words that should be included
    // For simplicity, we'll mark duplicates
    int* included = (int*)calloc(wordsSize, sizeof(int));
    
    for (int i = 0; i < wordsSize; i++) {
        if (isSuffixOfAnother[i]) {
            continue; // Skip words that are suffixes
        }
        
        // Check if this word was already included
        int alreadyIncluded = 0;
        for (int j = 0; j < i; j++) {
            if (included[j] && strcmp(words[i], words[j]) == 0) {
                alreadyIncluded = 1;
                break;
            }
        }
        
        if (!alreadyIncluded) {
            included[i] = 1;
        }
    }
    
    // Calculate total length
    int totalLength = 0;
    for (int i = 0; i < wordsSize; i++) {
        if (included[i]) {
            totalLength += strlen(words[i]) + 1; // +1 for '#'
        }
    }
    
    free(isSuffixOfAnother);
    free(included);
    return totalLength;
}

int main() {
    char line[10000];
    
    // Read words array
    fgets(line, sizeof(line), stdin);
    
    // Count words
    int wordsSize = 0;
    int inQuote = 0;
    for (int i = 0; line[i] != '\0'; i++) {
        if (line[i] == '"') {
            if (!inQuote) {
                wordsSize++;
            }
            inQuote = !inQuote;
        }
    }
    
    // Allocate words array
    char** words = (char**)malloc(wordsSize * sizeof(char*));
    for (int i = 0; i < wordsSize; i++) {
        words[i] = (char*)malloc(10 * sizeof(char));
    }
    
    // Parse words
    int wordIdx = 0;
    char* ptr = line;
    inQuote = 0;
    int charIdx = 0;
    
    while (*ptr) {
        if (*ptr == '"') {
            if (inQuote) {
                // End of word
                words[wordIdx][charIdx] = '\0';
                wordIdx++;
                charIdx = 0;
            }
            inQuote = !inQuote;
        } else if (inQuote && *ptr >= 'a' && *ptr <= 'z') {
            words[wordIdx][charIdx++] = *ptr;
        }
        ptr++;
    }
    
    int result = solution(words, wordsSize);
    printf("%d\n", result);
    
    // Free memory
    for (int i = 0; i < wordsSize; i++) {
        free(words[i]);
    }
    free(words);
    
    return 0;
}

Time & Space Complexity

Time Complexity

⏱️

O(n²×m)

For each of n words, check against n other words, each comparison takes O(m) time

⚠ Quadratic Growth

Space Complexity

O(1)

Only using variables, no extra data structures

✓ Linear Space

23.5K Views

Medium Frequency

~25 min Avg. Time

892 Likes

Ln 1, Col 1

Smart Actions

💡 Explanation

AI Ready

💡 Suggestion Tab to accept Esc to dismiss

// Output will appear here after running code

Code Editor Closed

Click the red button to reopen

Short Encoding of Words - Problem

Input & Output

Constraints

Visualization

Related Problems

Common Approaches

Brute Force - Check All Suffixes — Algorithm Steps

Visualization

Code -

Time & Space Complexity

Select Compiler