Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Find the longest substring with k unique characters in a given string in Python
Finding the longest substring with exactly k unique characters is a classic sliding window problem. We use two pointers to maintain a window and expand or shrink it based on the number of unique characters.
Problem Understanding
Given a string and a number k, we need to find the longest substring that contains exactly k unique characters. For example, in "ppqprqtqtqt" with k=3, the answer is "rqtqtqt" which has length 7 and contains exactly 3 unique characters: 'r', 'q', and 't'.
Algorithm Steps
The solution uses a sliding window approach ?
- First, check if the string has at least k unique characters
- Use two pointers (start and end) to maintain a window
- Expand the window by moving the end pointer
- Shrink the window from the start when we have more than k unique characters
- Track the longest valid window found
Implementation
def longest_k_unique_substring(s, k):
"""Find the longest substring with exactly k unique characters"""
n = len(s)
if n == 0 or k == 0:
return ""
# Count total unique characters in the string
unique_chars = len(set(s))
if unique_chars < k:
return "Not sufficient unique characters"
# Sliding window approach
char_count = {}
start = 0
max_length = 0
result_start = 0
for end in range(n):
# Add current character to window
char_count[s[end]] = char_count.get(s[end], 0) + 1
# Shrink window if we have more than k unique characters
while len(char_count) > k:
char_count[s[start]] -= 1
if char_count[s[start]] == 0:
del char_count[s[start]]
start += 1
# Update result if current window has exactly k unique characters
if len(char_count) == k:
current_length = end - start + 1
if current_length > max_length:
max_length = current_length
result_start = start
return s[result_start:result_start + max_length] if max_length > 0 else ""
# Test the function
s = "ppqprqtqtqt"
k = 3
result = longest_k_unique_substring(s, k)
print(f"Input string: {s}")
print(f"k = {k}")
print(f"Longest substring with {k} unique characters: {result}")
print(f"Length: {len(result)}")
Input string: ppqprqtqtqt k = 3 Longest substring with 3 unique characters: rqtqtqt Length: 7
How It Works
The algorithm maintains a character frequency map for the current window. As we expand the window by moving the end pointer, we add characters to our map. When the number of unique characters exceeds k, we shrink the window from the left until we have exactly k unique characters again.
Step-by-Step Trace
def trace_algorithm(s, k):
"""Trace through the algorithm step by step"""
char_count = {}
start = 0
max_length = 0
print(f"Finding longest substring with {k} unique characters in '{s}'")
print("-" * 60)
for end in range(len(s)):
# Add current character
char_count[s[end]] = char_count.get(s[end], 0) + 1
# Shrink if needed
while len(char_count) > k:
char_count[s[start]] -= 1
if char_count[s[start]] == 0:
del char_count[s[start]]
start += 1
current_window = s[start:end+1]
unique_count = len(char_count)
print(f"Window: '{current_window}' | Unique chars: {unique_count} | Length: {len(current_window)}")
if unique_count == k and len(current_window) > max_length:
max_length = len(current_window)
print(f" ? New best: length {max_length}")
trace_algorithm("ppqprqtqtqt", 3)
Finding longest substring with 3 unique characters in 'ppqprqtqtqt' ------------------------------------------------------------ Window: 'p' | Unique chars: 1 | Length: 1 Window: 'pp' | Unique chars: 1 | Length: 2 Window: 'ppq' | Unique chars: 2 | Length: 3 Window: 'ppqp' | Unique chars: 2 | Length: 4 Window: 'ppqpr' | Unique chars: 3 | Length: 5 ? New best: length 5 Window: 'pqprq' | Unique chars: 3 | Length: 5 Window: 'qprqt' | Unique chars: 3 | Length: 5 Window: 'prqtq' | Unique chars: 3 | Length: 5 Window: 'rqtqt' | Unique chars: 3 | Length: 5 Window: 'rqtqtq' | Unique chars: 3 | Length: 6 ? New best: length 6 Window: 'rqtqtqt' | Unique chars: 3 | Length: 7 ? New best: length 7
Time and Space Complexity
| Aspect | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Each character is visited at most twice |
| Space | O(k) | Hash map stores at most k unique characters |
Conclusion
The sliding window technique efficiently solves this problem in linear time. We maintain a window with at most k unique characters and track the longest valid window encountered during the traversal.
