Program to find maximum number of non-overlapping substrings in Python

Given a string with lowercase letters, we need to find the maximum number of non-overlapping substrings where each substring contains all occurrences of every character it includes.

Problem Understanding

The algorithm must satisfy two conditions ?

  • Substrings are non-overlapping

  • A substring containing character 'ch' must include all occurrences of 'ch' in the string

For example, if we have "pqstpqqprrr", the valid substrings are ["pqstpqqprrr", "pqstpqqp", "st", "s", "t", "rrr"]. We want the maximum count, so we choose ["s", "t", "rrr"].

Algorithm Approach

The solution uses a greedy approach ?

  1. Find the rightmost position of each unique character
  2. Group characters that must appear together in the same substring
  3. Build non-overlapping substrings from these groups

Implementation

def solve(s):
    # Get rightmost index of each unique character
    right = sorted([s.rindex(ch) for ch in set(s)])
    # Get leftmost index for characters at those positions
    left = [s.index(s[i]) for i in right]
    
    has, gen = [], []
    
    # Build initial character groups
    for i in range(len(right)):
        gen.append(set(s[right[i]]))
        has.append(set(s[left[i] + 1:right[i]]) - gen[-1])
    
    # Merge overlapping groups
    for j in range(len(has) - 2, -1, -1):
        if (has[-1] & gen[j]) and (has[j] & gen[-1]):
            gen[-1] = gen[-1] | gen[j]
            has[-1] = (has[-1] | has[j]) - gen[-1]
            del has[j], gen[j]
    
    # Build result substrings
    res, p_right = [], -1
    for ind in range(len(has)):
        l = min([i for i in left if s[i] in gen[ind]])
        r = max([i for i in right if s[i] in gen[ind]])
        if p_right < l:
            res.append(s[l : r + 1])
            p_right = r
    
    return res

# Test the function
s = "pqstpqqprrr"
result = solve(s)
print(f"Input: {s}")
print(f"Output: {result}")
Input: pqstpqqprrr
Output: ['s', 't', 'rrr']

How It Works

Let's trace through "pqstpqqprrr" ?

  1. Character positions: p appears at indices 0,5,7; q at 1,4,6; s at 2; t at 3; r at 8,9,10
  2. Rightmost indices: [2,3,10,7,6] for characters [s,t,r,p,q]
  3. Group formation: Characters that overlap in ranges get merged
  4. Final substrings: "s" (index 2), "t" (index 3), "rrr" (indices 8-10)

Key Points

  • The algorithm ensures no character is split across multiple substrings
  • Greedy approach maximizes the number of substrings
  • Time complexity is O(n²) due to the merging process

Conclusion

This greedy algorithm efficiently finds the maximum number of non-overlapping substrings by grouping characters that must appear together and then extracting valid substrings from left to right.

Updated on: 2026-03-26T14:07:06+05:30

711 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements