Program to Find Out the Smallest Substring Containing a Specific String in Python

Finding the smallest substring that contains a specific string as a subsequence is a challenging problem. Given two strings s and t, we need to find the shortest substring in s where t appears as a subsequence. If multiple substrings exist with the same minimum length, we return the leftmost one.

For example, if s = "abcbfbghfb" and t = "fg", the output will be "fbg" because it's the smallest substring containing "f" and "g" in sequence.

Algorithm Overview

We use dynamic programming to solve this problem efficiently ?

  • Create a DP array where dp[i] represents the length of the smallest substring ending at position i
  • For each character in the target string, update the DP array to track possible substring endings
  • Use a dictionary to track the last occurrence of each character
  • Find the minimum length and extract the corresponding substring

Implementation

class Solution:
    def solve(self, S, T):
        INF = float("inf")
        N = len(S)
        dp = [INF] * N
        
        # Initialize for first character of T
        for i in range(N):
            if S[i] == T[0]:
                dp[i] = 1
        
        # Process remaining characters of T
        for j in range(1, len(T)):
            last = {}
            dp2 = [INF] * N
            
            for i in range(N):
                if S[i] == T[j]:
                    prev_i = last.get(T[j - 1], None)
                    if prev_i is not None:
                        dp2[i] = dp[prev_i] + (i - prev_i)
                last[S[i]] = i
            
            dp = dp2
        
        # Find minimum length substring
        m = min(dp)
        i = dp.index(m)
        
        if m == INF:
            return ""
        
        return S[i - dp[i] + 1 : i + 1]

# Test the solution
solution = Solution()
result = solution.solve("abcbfbghfb", "fg")
print(f"Input: s='abcbfbghfb', t='fg'")
print(f"Output: '{result}'")
Input: s='abcbfbghfb', t='fg'
Output: 'fbg'

How It Works

The algorithm works in phases, one for each character in the target string ?

  1. Phase 1: Mark all positions where the first character of t appears with length 1
  2. Phase 2 onwards: For each character T[j], find previous occurrences of T[j-1] and calculate substring lengths
  3. Final step: Extract the substring with minimum length

Step-by-Step Example

For s = "abcbfbghfb" and t = "fg" ?

def trace_algorithm(S, T):
    print(f"Finding smallest substring in '{S}' containing '{T}' as subsequence")
    print(f"String positions: {list(enumerate(S))}")
    
    INF = float("inf")
    N = len(S)
    dp = [INF] * N
    
    # Phase 1: Find 'f'
    for i in range(N):
        if S[i] == T[0]:  # 'f'
            dp[i] = 1
            print(f"Found '{T[0]}' at position {i}, dp[{i}] = 1")
    
    print(f"After phase 1: {dp}")
    
    # Phase 2: Find 'g' after 'f'
    for j in range(1, len(T)):
        last = {}
        dp2 = [INF] * N
        
        print(f"\nPhase {j+1}: Looking for '{T[j]}'")
        
        for i in range(N):
            if S[i] == T[j]:  # 'g'
                prev_i = last.get(T[j - 1], None)  # last 'f'
                if prev_i is not None:
                    dp2[i] = dp[prev_i] + (i - prev_i)
                    print(f"Found '{T[j]}' at {i}, previous '{T[j-1]}' at {prev_i}")
                    print(f"Substring length: {dp2[i]} (from {prev_i} to {i})")
            last[S[i]] = i
        
        dp = dp2
        print(f"After phase {j+1}: {dp}")
    
    # Find result
    m = min(dp)
    idx = dp.index(m)
    result = S[idx - dp[idx] + 1 : idx + 1]
    
    print(f"\nSmallest substring: '{result}' (length {m})")
    return result

# Run the trace
trace_algorithm("abcbfbghfb", "fg")
Finding smallest substring in 'abcbfbghfb' containing 'fg' as subsequence
String positions: [(0, 'a'), (1, 'b'), (2, 'c'), (3, 'b'), (4, 'f'), (5, 'b'), (6, 'g'), (7, 'h'), (8, 'f'), (9, 'b')]
Found 'f' at position 4, dp[4] = 1
Found 'f' at position 8, dp[8] = 1
After phase 1: [inf, inf, inf, inf, 1, inf, inf, inf, 1, inf]

Phase 2: Looking for 'g'
Found 'g' at 6, previous 'f' at 4
Substring length: 3.0 (from 4 to 6)
After phase 2: [inf, inf, inf, inf, inf, inf, 3.0, inf, inf, inf]

Smallest substring: 'fbg' (length 3.0)

Conclusion

This dynamic programming solution efficiently finds the smallest substring containing a target string as a subsequence. The algorithm has O(n×m) time complexity where n and m are the lengths of the input strings.

Updated on: 2026-03-25T14:05:47+05:30

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements