Short Encoding of Words - Problem
Short Encoding of Words is a fascinating string compression problem that challenges you to find the most efficient way to encode an array of words.

Imagine you have a list of words and you want to create a reference string that can represent all of them using the minimum amount of space. The encoding works like this:

  • Create a reference string s that ends with '#'
  • For each word, store an index pointing to where that word appears in the reference string
  • Each word must be followed by a '#' character in the reference string

Example: For words ["time", "me", "bell"], one valid encoding could be the reference string "time#bell#" with indices [0, 2, 5], but a shorter encoding would be "time#bell#" with indices [0, 2, 5].

Your goal is to find the minimum length of such a reference string. The key insight is that if one word is a suffix of another (like "me" is a suffix of "time"), you can save space by only storing the longer word!

Input & Output

example_1.py โ€” Basic Case
$ Input: ["time", "me", "bell"]
โ€บ Output: 10
๐Ÿ’ก Note: We can encode as "time#bell#" since "me" is a suffix of "time". Total length = 4 + 1 + 4 + 1 = 10.
example_2.py โ€” No Common Suffixes
$ Input: ["t"]
โ€บ Output: 2
๐Ÿ’ก Note: Single word case results in "t#" with length 2.
example_3.py โ€” Multiple Suffix Relationships
$ Input: ["time", "me", "e", "atime"]
โ€บ Output: 7
๐Ÿ’ก Note: "e" is suffix of "me" and "time", "me" is suffix of "time" and "atime". Only "atime" needed: length = 5 + 1 + 1 = 7.

Visualization

Tap to expand
๐Ÿ“š Library Catalog OptimizationOriginal Catalog (Inefficient)"time#" + "me#" + "bell#"Length: 5 + 3 + 5 = 13โŒ Wastes space with redundancyOptimized Catalog"time#" + "bell#"Length: 5 + 5 = 10โœ… "me" found within "time"How the Trie Approach Works:1. Reverse the words: ["emit", "em", "lleb"]2. Build trie with reversed words3. Words ending at leaf nodes โ†’ keep4. Words ending at internal nodes โ†’ suffix of others๐Ÿ’ก "em" ends at internal node โ†’ it's a suffix!
Understanding the Visualization
1
Collect All Titles
Start with your list of book titles: "time", "me", "bell"
2
Find Overlaps
Notice that "me" appears at the end of "time" - it's a suffix relationship
3
Optimize Storage
Store only "time#bell#" since "me" can be found within "time"
4
Calculate Savings
Final catalog length is 10 characters instead of 13 if stored separately
Key Takeaway
๐ŸŽฏ Key Insight: By using a reverse trie, we efficiently identify all suffix relationships in O(nm) time, allowing us to exclude redundant words and achieve the minimum encoding length.

Time & Space Complexity

Time Complexity
โฑ๏ธ
O(nm)

Building trie takes O(nm) where n is number of words and m is average length, checking each word takes O(m)

n
2n
โœ“ Linear Growth
Space Complexity
O(nm)

Trie can store up to nm characters in worst case when no words share common suffixes

n
2n
โšก Linearithmic Space

Constraints

  • 1 โ‰ค words.length โ‰ค 2000
  • 1 โ‰ค words[i].length โ‰ค 7
  • words[i] consists of only lowercase letters
Asked in
Google 42 Amazon 38 Microsoft 31 Meta 25
73.2K Views
Medium Frequency
~25 min Avg. Time
1.8K Likes
Ln 1, Col 1
Smart Actions
๐Ÿ’ก Explanation
AI Ready
๐Ÿ’ก Suggestion Tab to accept Esc to dismiss
// Output will appear here after running code
Code Editor Closed
Click the red button to reopen