Article Categories

Selected Reading

Program to count number of homogenous substrings in Python

Python Server Side Programming Programming

A homogenous substring is a substring where all characters are the same. Given a string s, we need to count the total number of homogenous substrings and return the result modulo 10^9+7.

Understanding the Problem

For the string "xyyzzzxx", the homogenous substrings are ?

"x" appears 3 times (at positions 0, 6, 7)
"xx" appears 1 time (positions 6-7)
"y" appears 2 times (at positions 1, 2)
"yy" appears 1 time (positions 1-2)
"z" appears 3 times (at positions 3, 4, 5)
"zz" appears 2 times (positions 3-4, 4-5)
"zzz" appears 1 time (positions 3-5)

Total count: 3 + 1 + 2 + 1 + 3 + 2 + 1 = 13

Algorithm Approach

The key insight is that for a consecutive group of n identical characters, the number of homogenous substrings is n(n+1)/2. We can group consecutive identical characters and apply this formula.

Solution Implementation

def count_homogenous_substrings(s):
    MOD = 1000000007
    
    # Add sentinel character to handle last group
    s += "@"
    groups = {}
    prev_char = s[0]
    count = 1
    
    # Group consecutive identical characters
    for i in range(1, len(s)):
        if prev_char != s[i]:
            # Store the group (character repeated 'count' times)
            group_key = prev_char * count
            if group_key in groups:
                groups[group_key] += 1
            else:
                groups[group_key] = 1
            count = 1
        else:
            count += 1
        prev_char = s[i]
    
    # Calculate total homogenous substrings
    total = 0
    for group in groups:
        group_length = len(group)
        # For n consecutive identical chars: n(n+1)/2 substrings
        substrings_count = (group_length * (group_length + 1)) // 2
        total += substrings_count * groups[group]
    
    return total % MOD

# Test the function
s = "xyyzzzxx"
result = count_homogenous_substrings(s)
print(f"Number of homogenous substrings: {result}")

Number of homogenous substrings: 13

How It Works

The algorithm works in three main steps ?

Group consecutive characters: We iterate through the string and group consecutive identical characters together
Count substrings per group: For each group of n identical characters, we can form n(n+1)/2 homogenous substrings
Sum all counts: We multiply each group's substring count by its frequency and sum everything up

Optimized Solution

Here's a more direct approach without using a dictionary ?

def count_homogenous_optimized(s):
    MOD = 1000000007
    total = 0
    count = 1
    
    for i in range(len(s)):
        # If current char same as previous, increment count
        if i > 0 and s[i] == s[i-1]:
            count += 1
        else:
            count = 1
        
        # Add current count to total (represents all substrings ending at i)
        total = (total + count) % MOD
    
    return total

# Test both approaches
s = "xyyzzzxx"
result1 = count_homogenous_substrings(s)
result2 = count_homogenous_optimized(s)
print(f"Method 1 result: {result1}")
print(f"Method 2 result: {result2}")

Method 1 result: 13
Method 2 result: 13

Comparison

Method	Time Complexity	Space Complexity	Best For
Dictionary grouping	O(n)	O(n)	Understanding the pattern
Optimized single pass	O(n)	O(1)	Efficient implementation

Conclusion

The optimized single-pass solution is more efficient, counting homogenous substrings by tracking consecutive identical characters. For each position, we add the current consecutive count to our total, representing all homogenous substrings ending at that position.

Arnab Chakraborty

Updated on: 2026-03-26T14:05:19+05:30

552 Views

Previous Next