Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Program to find number of distinct subsequences in Python
Given a string s, we need to count the number of distinct subsequences in the string. A subsequence is formed by deleting some (possibly zero) characters from the original string while maintaining the relative order of remaining characters. If the answer is large, return the result modulo 10^9 + 7.
For example, if the input is s = "bab", the output will be 6 because there are 6 different subsequences: "" (empty), "b", "a", "ba", "ab", and "bab".
Algorithm
We use dynamic programming to solve this problem efficiently ?
Create a
dparray of size equal to string length, filled with 0For each character at index
i, find its last occurrence before positioniIf the character appears for the first time, add 1 plus sum of all previous dp values
If the character appeared before at index
ind, add sum of dp values fromindtoi-1Return the total sum of all dp values modulo 10^9 + 7
Example
Let's implement the solution to count distinct subsequences ?
def solve(s):
dp = [0] * len(s)
m = 10**9 + 7
for i, char in enumerate(s):
# Find last occurrence of current character before position i
ind = s.rfind(char, 0, i)
if ind == -1:
# Character appears for first time
dp[i] = (1 + sum(dp[:i])) % m
else:
# Character appeared before at index ind
dp[i] = sum(dp[ind:i]) % m
return sum(dp) % m
# Test with example
s = "bab"
print(f"Number of distinct subsequences in '{s}': {solve(s)}")
Number of distinct subsequences in 'bab': 6
Step-by-Step Execution
Let's trace through the algorithm with s = "bab" ?
def solve_with_trace(s):
dp = [0] * len(s)
m = 10**9 + 7
print(f"Processing string: '{s}'")
for i, char in enumerate(s):
ind = s.rfind(char, 0, i)
if ind == -1:
dp[i] = (1 + sum(dp[:i])) % m
print(f"i={i}, char='{char}', first occurrence, dp[{i}] = 1 + {sum(dp[:i])} = {dp[i]}")
else:
dp[i] = sum(dp[ind:i]) % m
print(f"i={i}, char='{char}', last seen at {ind}, dp[{i}] = sum(dp[{ind}:{i}]) = {dp[i]}")
total = sum(dp) % m
print(f"Final dp array: {dp}")
print(f"Total distinct subsequences: {total}")
return total
solve_with_trace("bab")
Processing string: 'bab' i=0, char='b', first occurrence, dp[0] = 1 + 0 = 1 i=1, char='a', first occurrence, dp[1] = 1 + 1 = 2 i=2, char='b', last seen at 0, dp[2] = sum(dp[0:2]) = 3 Final dp array: [1, 2, 3] Total distinct subsequences: 6
Testing with Multiple Examples
def solve(s):
if not s:
return 1 # Empty string has one subsequence (empty)
dp = [0] * len(s)
m = 10**9 + 7
for i, char in enumerate(s):
ind = s.rfind(char, 0, i)
if ind == -1:
dp[i] = (1 + sum(dp[:i])) % m
else:
dp[i] = sum(dp[ind:i]) % m
return sum(dp) % m
# Test cases
test_cases = ["bab", "abc", "aab", "aaaa"]
for s in test_cases:
result = solve(s)
print(f"String: '{s}' ? Distinct subsequences: {result}")
String: 'bab' ? Distinct subsequences: 6 String: 'abc' ? Distinct subsequences: 7 String: 'aab' ? Distinct subsequences: 4 String: 'aaaa' ? Distinct subsequences: 4
Conclusion
This dynamic programming solution efficiently counts distinct subsequences by tracking character occurrences and avoiding duplicate counting. The time complexity is O(n²) due to the sum operations, and space complexity is O(n) for the dp array.
