Find the longest substring with k unique characters in a given string in Python

Finding the longest substring with exactly k unique characters is a classic sliding window problem. We use two pointers to maintain a window and expand or shrink it based on the number of unique characters.

Problem Understanding

Given a string and a number k, we need to find the longest substring that contains exactly k unique characters. For example, in "ppqprqtqtqt" with k=3, the answer is "rqtqtqt" which has length 7 and contains exactly 3 unique characters: 'r', 'q', and 't'.

Algorithm Steps

The solution uses a sliding window approach ?

  • First, check if the string has at least k unique characters
  • Use two pointers (start and end) to maintain a window
  • Expand the window by moving the end pointer
  • Shrink the window from the start when we have more than k unique characters
  • Track the longest valid window found

Implementation

def longest_k_unique_substring(s, k):
    """Find the longest substring with exactly k unique characters"""
    n = len(s)
    if n == 0 or k == 0:
        return ""
    
    # Count total unique characters in the string
    unique_chars = len(set(s))
    if unique_chars < k:
        return "Not sufficient unique characters"
    
    # Sliding window approach
    char_count = {}
    start = 0
    max_length = 0
    result_start = 0
    
    for end in range(n):
        # Add current character to window
        char_count[s[end]] = char_count.get(s[end], 0) + 1
        
        # Shrink window if we have more than k unique characters
        while len(char_count) > k:
            char_count[s[start]] -= 1
            if char_count[s[start]] == 0:
                del char_count[s[start]]
            start += 1
        
        # Update result if current window has exactly k unique characters
        if len(char_count) == k:
            current_length = end - start + 1
            if current_length > max_length:
                max_length = current_length
                result_start = start
    
    return s[result_start:result_start + max_length] if max_length > 0 else ""

# Test the function
s = "ppqprqtqtqt"
k = 3
result = longest_k_unique_substring(s, k)
print(f"Input string: {s}")
print(f"k = {k}")
print(f"Longest substring with {k} unique characters: {result}")
print(f"Length: {len(result)}")
Input string: ppqprqtqtqt
k = 3
Longest substring with 3 unique characters: rqtqtqt
Length: 7

How It Works

The algorithm maintains a character frequency map for the current window. As we expand the window by moving the end pointer, we add characters to our map. When the number of unique characters exceeds k, we shrink the window from the left until we have exactly k unique characters again.

Step-by-Step Trace

def trace_algorithm(s, k):
    """Trace through the algorithm step by step"""
    char_count = {}
    start = 0
    max_length = 0
    
    print(f"Finding longest substring with {k} unique characters in '{s}'")
    print("-" * 60)
    
    for end in range(len(s)):
        # Add current character
        char_count[s[end]] = char_count.get(s[end], 0) + 1
        
        # Shrink if needed
        while len(char_count) > k:
            char_count[s[start]] -= 1
            if char_count[s[start]] == 0:
                del char_count[s[start]]
            start += 1
        
        current_window = s[start:end+1]
        unique_count = len(char_count)
        
        print(f"Window: '{current_window}' | Unique chars: {unique_count} | Length: {len(current_window)}")
        
        if unique_count == k and len(current_window) > max_length:
            max_length = len(current_window)
            print(f"  ? New best: length {max_length}")

trace_algorithm("ppqprqtqtqt", 3)
Finding longest substring with 3 unique characters in 'ppqprqtqtqt'
------------------------------------------------------------
Window: 'p' | Unique chars: 1 | Length: 1
Window: 'pp' | Unique chars: 1 | Length: 2
Window: 'ppq' | Unique chars: 2 | Length: 3
Window: 'ppqp' | Unique chars: 2 | Length: 4
Window: 'ppqpr' | Unique chars: 3 | Length: 5
  ? New best: length 5
Window: 'pqprq' | Unique chars: 3 | Length: 5
Window: 'qprqt' | Unique chars: 3 | Length: 5
Window: 'prqtq' | Unique chars: 3 | Length: 5
Window: 'rqtqt' | Unique chars: 3 | Length: 5
Window: 'rqtqtq' | Unique chars: 3 | Length: 6
  ? New best: length 6
Window: 'rqtqtqt' | Unique chars: 3 | Length: 7
  ? New best: length 7

Time and Space Complexity

Aspect Complexity Explanation
Time O(n) Each character is visited at most twice
Space O(k) Hash map stores at most k unique characters

Conclusion

The sliding window technique efficiently solves this problem in linear time. We maintain a window with at most k unique characters and track the longest valid window encountered during the traversal.

Updated on: 2026-03-25T09:30:37+05:30

729 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements