Find the Shortest Superstring - Problem
Find the Shortest Superstring is a fascinating string optimization problem that challenges you to find the most efficient way to combine multiple strings into one compact superstring.
Given an array of strings
Key Points:
• If multiple valid strings exist with the same minimum length, return any of them
• No string in the input is a substring of another string
• The goal is to maximize overlaps between strings to minimize total length
Example: For
Given an array of strings
words, your task is to construct the shortest possible string that contains each string in words as a substring. Think of it as creating a master string that encompasses all input strings with maximum overlap.Key Points:
• If multiple valid strings exist with the same minimum length, return any of them
• No string in the input is a substring of another string
• The goal is to maximize overlaps between strings to minimize total length
Example: For
["catg", "ctaagt", "gcta"], one possible superstring is "gctaagttcatg" which contains all three strings with optimal overlapping. Input & Output
example_1.py — Basic Case
$
Input:
["catg", "ctaagt", "gcta"]
›
Output:
"gctaagttcatg"
💡 Note:
The strings can be arranged as gcta → catg → ctaagt with overlaps: gcta+catg overlap by 'cat' (3 chars), catg+ctaagt overlap by 'ct' (2 chars). Result: gcta + g + aagt = gctaagttcatg
example_2.py — Simple Chain
$
Input:
["ab", "bc", "cd"]
›
Output:
"abcd"
💡 Note:
Perfect chain with single character overlaps: ab → bc → cd becomes ab + c + d = abcd
example_3.py — No Overlaps
$
Input:
["abc", "def", "ghi"]
›
Output:
"abcdefghi"
💡 Note:
When no overlaps exist between any strings, we simply concatenate them in any order. Total length equals sum of all string lengths.
Visualization
Tap to expand
Understanding the Visualization
1
Identify Fragments
You receive DNA fragments: 'CATG', 'CTAAGT', 'GCTA' that came from a larger sequence
2
Find Overlaps
Look for overlapping regions: 'GCTA' overlaps with 'CATG' by 'CAT', 'CATG' overlaps with 'CTAAGT' by 'CT'
3
Optimal Assembly
Arrange fragments to maximize overlaps: GCTA + G + AAGT = GCTAAGTTCATG
4
Verify Completeness
Confirm all original fragments exist as substrings in the final genome sequence
Key Takeaway
🎯 Key Insight: This is essentially the Traveling Salesman Problem where we want to visit all string 'cities' exactly once, minimizing the total 'travel cost' (superstring length). The DP with bitmask approach efficiently explores all possible orderings while avoiding redundant calculations.
Time & Space Complexity
Time Complexity
O(n² × 2ⁿ)
2ⁿ possible bitmask states, each with n possible endings, and n transitions
⚠ Quadratic Growth
Space Complexity
O(n × 2ⁿ)
DP table storing minimum length for each (mask, ending) state
⚡ Linearithmic Space
Constraints
- 1 ≤ words.length ≤ 12
- 1 ≤ words[i].length ≤ 20
- words[i] consists of lowercase English letters
- No string is a substring of another string
- The answer is guaranteed to be unique
💡
Explanation
AI Ready
💡 Suggestion
Tab
to accept
Esc
to dismiss
// Output will appear here after running code