Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Program to find length of longest substring which contains k distinct characters in Python
Finding the length of the longest substring with at most k distinct characters is a classic sliding window problem. We use two pointers to maintain a valid window and a hash map to track character frequencies.
Problem Understanding
Given a string s and number k, we need to find the maximum length of any substring that contains at most k distinct characters.
For example, if k = 3 and s = "kolkata", the longest substrings with at most 3 distinct characters are "kolk" and "kata", both having length 4.
Algorithm Steps
We use the sliding window technique with these steps −
Initialize
ans = 0,left = 0, and an empty hash mapFor each right pointer position, add the character to our window
If distinct characters ≤ k, update the maximum length
If distinct characters > k, shrink window from left until valid
Return the maximum length found
Implementation
def longest_substring_k_distinct(s, k):
if k == 0 or not s:
return 0
ans = 0
left = 0
char_count = {}
for right in range(len(s)):
# Add current character to window
char_count[s[right]] = char_count.get(s[right], 0) + 1
# If window is valid, update answer
if len(char_count) <= k:
ans = max(ans, right - left + 1)
else:
# Shrink window from left
while len(char_count) > k:
left_char = s[left]
char_count[left_char] -= 1
if char_count[left_char] == 0:
del char_count[left_char]
left += 1
return ans
# Test the function
k = 3
s = "kolkata"
result = longest_substring_k_distinct(s, k)
print(f"Longest substring with at most {k} distinct characters: {result}")
Longest substring with at most 3 distinct characters: 4
Step-by-Step Trace
Let's trace through s = "kolkata" and k = 3 −
def trace_algorithm(s, k):
ans = 0
left = 0
char_count = {}
print(f"Finding longest substring with at most {k} distinct characters in '{s}'")
print("-" * 60)
for right in range(len(s)):
char_count[s[right]] = char_count.get(s[right], 0) + 1
print(f"Right={right}, char='{s[right]}', window='{s[left:right+1]}'")
print(f" Char count: {char_count}")
print(f" Distinct chars: {len(char_count)}")
if len(char_count) <= k:
ans = max(ans, right - left + 1)
print(f" Valid window, length={right - left + 1}, max_so_far={ans}")
else:
print(f" Too many distinct chars, shrinking window...")
while len(char_count) > k:
left_char = s[left]
char_count[left_char] -= 1
if char_count[left_char] == 0:
del char_count[left_char]
left += 1
print(f" Removed '{left_char}', new window='{s[left:right+1]}'")
print()
return ans
result = trace_algorithm("kolkata", 3)
print(f"Final answer: {result}")
Finding longest substring with at most 3 distinct characters in 'kolkata'
------------------------------------------------------------
Right=0, char='k', window='k'
Char count: {'k': 1}
Distinct chars: 1
Valid window, length=1, max_so_far=1
Right=1, char='o', window='ko'
Char count: {'k': 1, 'o': 1}
Distinct chars: 2
Valid window, length=2, max_so_far=2
Right=2, char='l', window='kol'
Char count: {'k': 1, 'o': 1, 'l': 1}
Distinct chars: 3
Valid window, length=3, max_so_far=3
Right=3, char='k', window='kolk'
Char count: {'k': 2, 'o': 1, 'l': 1}
Distinct chars: 3
Valid window, length=4, max_so_far=4
Right=4, char='a', window='kolka'
Char count: {'k': 2, 'o': 1, 'l': 1, 'a': 1}
Distinct chars: 4
Too many distinct chars, shrinking window...
Removed 'k', new window='olka'
Right=5, char='t', window='olkat'
Char count: {'k': 1, 'o': 1, 'l': 1, 'a': 1, 't': 1}
Distinct chars: 5
Too many distinct chars, shrinking window...
Removed 'o', new window='lkat'
Removed 'l', new window='kat'
Right=6, char='a', window='kata'
Char count: {'k': 1, 'a': 2, 't': 1}
Distinct chars: 3
Valid window, length=4, max_so_far=4
Final answer: 4
Time and Space Complexity
Time Complexity: O(n), where n is the length of string. Each character is visited at most twice.
Space Complexity: O(min(n, k)), for storing character frequencies in the hash map.
Conclusion
The sliding window technique efficiently solves this problem in O(n) time. We maintain a valid window with at most k distinct characters using two pointers and a frequency map.
