Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Program to find maximum number of non-overlapping substrings in Python
Given a string with lowercase letters, we need to find the maximum number of non-overlapping substrings where each substring contains all occurrences of every character it includes.
Problem Understanding
The algorithm must satisfy two conditions ?
Substrings are non-overlapping
A substring containing character 'ch' must include all occurrences of 'ch' in the string
For example, if we have "pqstpqqprrr", the valid substrings are ["pqstpqqprrr", "pqstpqqp", "st", "s", "t", "rrr"]. We want the maximum count, so we choose ["s", "t", "rrr"].
Algorithm Approach
The solution uses a greedy approach ?
- Find the rightmost position of each unique character
- Group characters that must appear together in the same substring
- Build non-overlapping substrings from these groups
Implementation
def solve(s):
# Get rightmost index of each unique character
right = sorted([s.rindex(ch) for ch in set(s)])
# Get leftmost index for characters at those positions
left = [s.index(s[i]) for i in right]
has, gen = [], []
# Build initial character groups
for i in range(len(right)):
gen.append(set(s[right[i]]))
has.append(set(s[left[i] + 1:right[i]]) - gen[-1])
# Merge overlapping groups
for j in range(len(has) - 2, -1, -1):
if (has[-1] & gen[j]) and (has[j] & gen[-1]):
gen[-1] = gen[-1] | gen[j]
has[-1] = (has[-1] | has[j]) - gen[-1]
del has[j], gen[j]
# Build result substrings
res, p_right = [], -1
for ind in range(len(has)):
l = min([i for i in left if s[i] in gen[ind]])
r = max([i for i in right if s[i] in gen[ind]])
if p_right < l:
res.append(s[l : r + 1])
p_right = r
return res
# Test the function
s = "pqstpqqprrr"
result = solve(s)
print(f"Input: {s}")
print(f"Output: {result}")
Input: pqstpqqprrr Output: ['s', 't', 'rrr']
How It Works
Let's trace through "pqstpqqprrr" ?
- Character positions: p appears at indices 0,5,7; q at 1,4,6; s at 2; t at 3; r at 8,9,10
- Rightmost indices: [2,3,10,7,6] for characters [s,t,r,p,q]
- Group formation: Characters that overlap in ranges get merged
- Final substrings: "s" (index 2), "t" (index 3), "rrr" (indices 8-10)
Key Points
- The algorithm ensures no character is split across multiple substrings
- Greedy approach maximizes the number of substrings
- Time complexity is O(n²) due to the merging process
Conclusion
This greedy algorithm efficiently finds the maximum number of non-overlapping substrings by grouping characters that must appear together and then extracting valid substrings from left to right.
