Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Program to find size of common special substrings of two given strings in Python
Suppose we have two strings s1 and s2. We have to find the size of longest string s3 which is a special substring of both s1 and s2.
We can say a string x is a special substring of another string y if x can be generated by removing 0 or more characters from y. This is also known as the Longest Common Subsequence (LCS) problem.
So, if the input is like s1 = 'pineapple' and s2 = 'people', then the output will be 5 as the special substring is 'peple', of size 5.
Algorithm
To solve this, we will follow these steps ?
- prev := a new dictionary, where if some key is not present, return 0
- for i in range 0 to size of s1 - 1, do
- cur := a new dictionary, where if some key is not present, return 0
- for j in range 0 to size of s2 - 1, do
- cur[j] := prev[j - 1] + 1 when s1[i] is same as s2[j] otherwise maximum of cur[j - 1] and prev[j]
- prev := cur
- return prev[size of s2 - 1]
Example
Let us see the following implementation to get better understanding ?
from collections import defaultdict
def solve(s1, s2):
prev = defaultdict(int)
for i in range(len(s1)):
cur = defaultdict(int)
for j in range(len(s2)):
cur[j] = prev[j - 1] + 1 if s1[i] == s2[j] else max(cur[j - 1], prev[j])
prev = cur
return prev[len(s2) - 1]
s1 = 'pineapple'
s2 = 'people'
print(solve(s1, s2))
5
How It Works
The algorithm uses dynamic programming with space optimization. Instead of maintaining a 2D table, we use two dictionaries (prev and cur) to store the previous and current rows of the DP table.
For each character in s1 and s2:
- If characters match, we add 1 to the diagonal value
- If they don't match, we take the maximum of left and top values
Alternative Implementation
Here's a cleaner version using a 2D table approach ?
def longest_common_subsequence(s1, s2):
m, n = len(s1), len(s2)
dp = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(1, m + 1):
for j in range(1, n + 1):
if s1[i-1] == s2[j-1]:
dp[i][j] = dp[i-1][j-1] + 1
else:
dp[i][j] = max(dp[i-1][j], dp[i][j-1])
return dp[m][n]
s1 = 'pineapple'
s2 = 'people'
print(longest_common_subsequence(s1, s2))
5
Conclusion
The longest common subsequence problem can be solved efficiently using dynamic programming. The space-optimized version uses O(min(m,n)) space while the standard approach uses O(m*n) space, both with O(m*n) time complexity.
