Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Program to Find Out the Smallest Substring Containing a Specific String in Python
Finding the smallest substring that contains a specific string as a subsequence is a challenging problem. Given two strings s and t, we need to find the shortest substring in s where t appears as a subsequence. If multiple substrings exist with the same minimum length, we return the leftmost one.
For example, if s = "abcbfbghfb" and t = "fg", the output will be "fbg" because it's the smallest substring containing "f" and "g" in sequence.
Algorithm Overview
We use dynamic programming to solve this problem efficiently ?
- Create a DP array where
dp[i]represents the length of the smallest substring ending at positioni - For each character in the target string, update the DP array to track possible substring endings
- Use a dictionary to track the last occurrence of each character
- Find the minimum length and extract the corresponding substring
Implementation
class Solution:
def solve(self, S, T):
INF = float("inf")
N = len(S)
dp = [INF] * N
# Initialize for first character of T
for i in range(N):
if S[i] == T[0]:
dp[i] = 1
# Process remaining characters of T
for j in range(1, len(T)):
last = {}
dp2 = [INF] * N
for i in range(N):
if S[i] == T[j]:
prev_i = last.get(T[j - 1], None)
if prev_i is not None:
dp2[i] = dp[prev_i] + (i - prev_i)
last[S[i]] = i
dp = dp2
# Find minimum length substring
m = min(dp)
i = dp.index(m)
if m == INF:
return ""
return S[i - dp[i] + 1 : i + 1]
# Test the solution
solution = Solution()
result = solution.solve("abcbfbghfb", "fg")
print(f"Input: s='abcbfbghfb', t='fg'")
print(f"Output: '{result}'")
Input: s='abcbfbghfb', t='fg' Output: 'fbg'
How It Works
The algorithm works in phases, one for each character in the target string ?
-
Phase 1: Mark all positions where the first character of
tappears with length 1 -
Phase 2 onwards: For each character
T[j], find previous occurrences ofT[j-1]and calculate substring lengths - Final step: Extract the substring with minimum length
Step-by-Step Example
For s = "abcbfbghfb" and t = "fg" ?
def trace_algorithm(S, T):
print(f"Finding smallest substring in '{S}' containing '{T}' as subsequence")
print(f"String positions: {list(enumerate(S))}")
INF = float("inf")
N = len(S)
dp = [INF] * N
# Phase 1: Find 'f'
for i in range(N):
if S[i] == T[0]: # 'f'
dp[i] = 1
print(f"Found '{T[0]}' at position {i}, dp[{i}] = 1")
print(f"After phase 1: {dp}")
# Phase 2: Find 'g' after 'f'
for j in range(1, len(T)):
last = {}
dp2 = [INF] * N
print(f"\nPhase {j+1}: Looking for '{T[j]}'")
for i in range(N):
if S[i] == T[j]: # 'g'
prev_i = last.get(T[j - 1], None) # last 'f'
if prev_i is not None:
dp2[i] = dp[prev_i] + (i - prev_i)
print(f"Found '{T[j]}' at {i}, previous '{T[j-1]}' at {prev_i}")
print(f"Substring length: {dp2[i]} (from {prev_i} to {i})")
last[S[i]] = i
dp = dp2
print(f"After phase {j+1}: {dp}")
# Find result
m = min(dp)
idx = dp.index(m)
result = S[idx - dp[idx] + 1 : idx + 1]
print(f"\nSmallest substring: '{result}' (length {m})")
return result
# Run the trace
trace_algorithm("abcbfbghfb", "fg")
Finding smallest substring in 'abcbfbghfb' containing 'fg' as subsequence String positions: [(0, 'a'), (1, 'b'), (2, 'c'), (3, 'b'), (4, 'f'), (5, 'b'), (6, 'g'), (7, 'h'), (8, 'f'), (9, 'b')] Found 'f' at position 4, dp[4] = 1 Found 'f' at position 8, dp[8] = 1 After phase 1: [inf, inf, inf, inf, 1, inf, inf, inf, 1, inf] Phase 2: Looking for 'g' Found 'g' at 6, previous 'f' at 4 Substring length: 3.0 (from 4 to 6) After phase 2: [inf, inf, inf, inf, inf, inf, 3.0, inf, inf, inf] Smallest substring: 'fbg' (length 3.0)
Conclusion
This dynamic programming solution efficiently finds the smallest substring containing a target string as a subsequence. The algorithm has O(n×m) time complexity where n and m are the lengths of the input strings.
