Program to count number of homogenous substrings in Python

A homogenous substring is a substring where all characters are the same. Given a string s, we need to count the total number of homogenous substrings and return the result modulo 10^9+7.

Understanding the Problem

For the string "xyyzzzxx", the homogenous substrings are ?

  • "x" appears 3 times (at positions 0, 6, 7)

  • "xx" appears 1 time (positions 6-7)

  • "y" appears 2 times (at positions 1, 2)

  • "yy" appears 1 time (positions 1-2)

  • "z" appears 3 times (at positions 3, 4, 5)

  • "zz" appears 2 times (positions 3-4, 4-5)

  • "zzz" appears 1 time (positions 3-5)

Total count: 3 + 1 + 2 + 1 + 3 + 2 + 1 = 13

Algorithm Approach

The key insight is that for a consecutive group of n identical characters, the number of homogenous substrings is n(n+1)/2. We can group consecutive identical characters and apply this formula.

Solution Implementation

def count_homogenous_substrings(s):
    MOD = 1000000007
    
    # Add sentinel character to handle last group
    s += "@"
    groups = {}
    prev_char = s[0]
    count = 1
    
    # Group consecutive identical characters
    for i in range(1, len(s)):
        if prev_char != s[i]:
            # Store the group (character repeated 'count' times)
            group_key = prev_char * count
            if group_key in groups:
                groups[group_key] += 1
            else:
                groups[group_key] = 1
            count = 1
        else:
            count += 1
        prev_char = s[i]
    
    # Calculate total homogenous substrings
    total = 0
    for group in groups:
        group_length = len(group)
        # For n consecutive identical chars: n(n+1)/2 substrings
        substrings_count = (group_length * (group_length + 1)) // 2
        total += substrings_count * groups[group]
    
    return total % MOD

# Test the function
s = "xyyzzzxx"
result = count_homogenous_substrings(s)
print(f"Number of homogenous substrings: {result}")
Number of homogenous substrings: 13

How It Works

The algorithm works in three main steps ?

  1. Group consecutive characters: We iterate through the string and group consecutive identical characters together

  2. Count substrings per group: For each group of n identical characters, we can form n(n+1)/2 homogenous substrings

  3. Sum all counts: We multiply each group's substring count by its frequency and sum everything up

Optimized Solution

Here's a more direct approach without using a dictionary ?

def count_homogenous_optimized(s):
    MOD = 1000000007
    total = 0
    count = 1
    
    for i in range(len(s)):
        # If current char same as previous, increment count
        if i > 0 and s[i] == s[i-1]:
            count += 1
        else:
            count = 1
        
        # Add current count to total (represents all substrings ending at i)
        total = (total + count) % MOD
    
    return total

# Test both approaches
s = "xyyzzzxx"
result1 = count_homogenous_substrings(s)
result2 = count_homogenous_optimized(s)
print(f"Method 1 result: {result1}")
print(f"Method 2 result: {result2}")
Method 1 result: 13
Method 2 result: 13

Comparison

Method Time Complexity Space Complexity Best For
Dictionary grouping O(n) O(n) Understanding the pattern
Optimized single pass O(n) O(1) Efficient implementation

Conclusion

The optimized single-pass solution is more efficient, counting homogenous substrings by tracking consecutive identical characters. For each position, we add the current consecutive count to our total, representing all homogenous substrings ending at that position.

Updated on: 2026-03-26T14:05:19+05:30

485 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements