Program to find number of distinct subsequences in Python

Given a string s, we need to count the number of distinct subsequences in the string. A subsequence is formed by deleting some (possibly zero) characters from the original string while maintaining the relative order of remaining characters. If the answer is large, return the result modulo 10^9 + 7.

For example, if the input is s = "bab", the output will be 6 because there are 6 different subsequences: "" (empty), "b", "a", "ba", "ab", and "bab".

Algorithm

We use dynamic programming to solve this problem efficiently ?

  • Create a dp array of size equal to string length, filled with 0

  • For each character at index i, find its last occurrence before position i

  • If the character appears for the first time, add 1 plus sum of all previous dp values

  • If the character appeared before at index ind, add sum of dp values from ind to i-1

  • Return the total sum of all dp values modulo 10^9 + 7

Example

Let's implement the solution to count distinct subsequences ?

def solve(s):
    dp = [0] * len(s)
    m = 10**9 + 7
    
    for i, char in enumerate(s):
        # Find last occurrence of current character before position i
        ind = s.rfind(char, 0, i)
        
        if ind == -1:
            # Character appears for first time
            dp[i] = (1 + sum(dp[:i])) % m
        else:
            # Character appeared before at index ind
            dp[i] = sum(dp[ind:i]) % m
    
    return sum(dp) % m

# Test with example
s = "bab"
print(f"Number of distinct subsequences in '{s}': {solve(s)}")
Number of distinct subsequences in 'bab': 6

Step-by-Step Execution

Let's trace through the algorithm with s = "bab" ?

def solve_with_trace(s):
    dp = [0] * len(s)
    m = 10**9 + 7
    print(f"Processing string: '{s}'")
    
    for i, char in enumerate(s):
        ind = s.rfind(char, 0, i)
        
        if ind == -1:
            dp[i] = (1 + sum(dp[:i])) % m
            print(f"i={i}, char='{char}', first occurrence, dp[{i}] = 1 + {sum(dp[:i])} = {dp[i]}")
        else:
            dp[i] = sum(dp[ind:i]) % m
            print(f"i={i}, char='{char}', last seen at {ind}, dp[{i}] = sum(dp[{ind}:{i}]) = {dp[i]}")
    
    total = sum(dp) % m
    print(f"Final dp array: {dp}")
    print(f"Total distinct subsequences: {total}")
    return total

solve_with_trace("bab")
Processing string: 'bab'
i=0, char='b', first occurrence, dp[0] = 1 + 0 = 1
i=1, char='a', first occurrence, dp[1] = 1 + 1 = 2
i=2, char='b', last seen at 0, dp[2] = sum(dp[0:2]) = 3
Final dp array: [1, 2, 3]
Total distinct subsequences: 6

Testing with Multiple Examples

def solve(s):
    if not s:
        return 1  # Empty string has one subsequence (empty)
    
    dp = [0] * len(s)
    m = 10**9 + 7
    
    for i, char in enumerate(s):
        ind = s.rfind(char, 0, i)
        if ind == -1:
            dp[i] = (1 + sum(dp[:i])) % m
        else:
            dp[i] = sum(dp[ind:i]) % m
    
    return sum(dp) % m

# Test cases
test_cases = ["bab", "abc", "aab", "aaaa"]

for s in test_cases:
    result = solve(s)
    print(f"String: '{s}' ? Distinct subsequences: {result}")
String: 'bab' ? Distinct subsequences: 6
String: 'abc' ? Distinct subsequences: 7
String: 'aab' ? Distinct subsequences: 4
String: 'aaaa' ? Distinct subsequences: 4

Conclusion

This dynamic programming solution efficiently counts distinct subsequences by tracking character occurrences and avoiding duplicate counting. The time complexity is O(n²) due to the sum operations, and space complexity is O(n) for the dp array.

Updated on: 2026-03-26T14:30:09+05:30

899 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements