Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Program to count number of homogenous substrings in Python
A homogenous substring is a substring where all characters are the same. Given a string s, we need to count the total number of homogenous substrings and return the result modulo 10^9+7.
Understanding the Problem
For the string "xyyzzzxx", the homogenous substrings are ?
"x" appears 3 times (at positions 0, 6, 7)
"xx" appears 1 time (positions 6-7)
"y" appears 2 times (at positions 1, 2)
"yy" appears 1 time (positions 1-2)
"z" appears 3 times (at positions 3, 4, 5)
"zz" appears 2 times (positions 3-4, 4-5)
"zzz" appears 1 time (positions 3-5)
Total count: 3 + 1 + 2 + 1 + 3 + 2 + 1 = 13
Algorithm Approach
The key insight is that for a consecutive group of n identical characters, the number of homogenous substrings is n(n+1)/2. We can group consecutive identical characters and apply this formula.
Solution Implementation
def count_homogenous_substrings(s):
MOD = 1000000007
# Add sentinel character to handle last group
s += "@"
groups = {}
prev_char = s[0]
count = 1
# Group consecutive identical characters
for i in range(1, len(s)):
if prev_char != s[i]:
# Store the group (character repeated 'count' times)
group_key = prev_char * count
if group_key in groups:
groups[group_key] += 1
else:
groups[group_key] = 1
count = 1
else:
count += 1
prev_char = s[i]
# Calculate total homogenous substrings
total = 0
for group in groups:
group_length = len(group)
# For n consecutive identical chars: n(n+1)/2 substrings
substrings_count = (group_length * (group_length + 1)) // 2
total += substrings_count * groups[group]
return total % MOD
# Test the function
s = "xyyzzzxx"
result = count_homogenous_substrings(s)
print(f"Number of homogenous substrings: {result}")
Number of homogenous substrings: 13
How It Works
The algorithm works in three main steps ?
Group consecutive characters: We iterate through the string and group consecutive identical characters together
Count substrings per group: For each group of n identical characters, we can form n(n+1)/2 homogenous substrings
Sum all counts: We multiply each group's substring count by its frequency and sum everything up
Optimized Solution
Here's a more direct approach without using a dictionary ?
def count_homogenous_optimized(s):
MOD = 1000000007
total = 0
count = 1
for i in range(len(s)):
# If current char same as previous, increment count
if i > 0 and s[i] == s[i-1]:
count += 1
else:
count = 1
# Add current count to total (represents all substrings ending at i)
total = (total + count) % MOD
return total
# Test both approaches
s = "xyyzzzxx"
result1 = count_homogenous_substrings(s)
result2 = count_homogenous_optimized(s)
print(f"Method 1 result: {result1}")
print(f"Method 2 result: {result2}")
Method 1 result: 13 Method 2 result: 13
Comparison
| Method | Time Complexity | Space Complexity | Best For |
|---|---|---|---|
| Dictionary grouping | O(n) | O(n) | Understanding the pattern |
| Optimized single pass | O(n) | O(1) | Efficient implementation |
Conclusion
The optimized single-pass solution is more efficient, counting homogenous substrings by tracking consecutive identical characters. For each position, we add the current consecutive count to our total, representing all homogenous substrings ending at that position.
