Program to find length of longest substring which contains k distinct characters in Python

Finding the length of the longest substring with at most k distinct characters is a classic sliding window problem. We use two pointers to maintain a valid window and a hash map to track character frequencies.

Problem Understanding

Given a string s and number k, we need to find the maximum length of any substring that contains at most k distinct characters.

For example, if k = 3 and s = "kolkata", the longest substrings with at most 3 distinct characters are "kolk" and "kata", both having length 4.

Algorithm Steps

We use the sliding window technique with these steps −

  • Initialize ans = 0, left = 0, and an empty hash map

  • For each right pointer position, add the character to our window

  • If distinct characters ≤ k, update the maximum length

  • If distinct characters > k, shrink window from left until valid

  • Return the maximum length found

Implementation

def longest_substring_k_distinct(s, k):
    if k == 0 or not s:
        return 0
    
    ans = 0
    left = 0
    char_count = {}
    
    for right in range(len(s)):
        # Add current character to window
        char_count[s[right]] = char_count.get(s[right], 0) + 1
        
        # If window is valid, update answer
        if len(char_count) <= k:
            ans = max(ans, right - left + 1)
        else:
            # Shrink window from left
            while len(char_count) > k:
                left_char = s[left]
                char_count[left_char] -= 1
                if char_count[left_char] == 0:
                    del char_count[left_char]
                left += 1
    
    return ans

# Test the function
k = 3
s = "kolkata"
result = longest_substring_k_distinct(s, k)
print(f"Longest substring with at most {k} distinct characters: {result}")
Longest substring with at most 3 distinct characters: 4

Step-by-Step Trace

Let's trace through s = "kolkata" and k = 3

def trace_algorithm(s, k):
    ans = 0
    left = 0
    char_count = {}
    
    print(f"Finding longest substring with at most {k} distinct characters in '{s}'")
    print("-" * 60)
    
    for right in range(len(s)):
        char_count[s[right]] = char_count.get(s[right], 0) + 1
        
        print(f"Right={right}, char='{s[right]}', window='{s[left:right+1]}'")
        print(f"  Char count: {char_count}")
        print(f"  Distinct chars: {len(char_count)}")
        
        if len(char_count) <= k:
            ans = max(ans, right - left + 1)
            print(f"  Valid window, length={right - left + 1}, max_so_far={ans}")
        else:
            print(f"  Too many distinct chars, shrinking window...")
            while len(char_count) > k:
                left_char = s[left]
                char_count[left_char] -= 1
                if char_count[left_char] == 0:
                    del char_count[left_char]
                left += 1
                print(f"    Removed '{left_char}', new window='{s[left:right+1]}'")
        print()
    
    return ans

result = trace_algorithm("kolkata", 3)
print(f"Final answer: {result}")
Finding longest substring with at most 3 distinct characters in 'kolkata'
------------------------------------------------------------
Right=0, char='k', window='k'
  Char count: {'k': 1}
  Distinct chars: 1
  Valid window, length=1, max_so_far=1

Right=1, char='o', window='ko'
  Char count: {'k': 1, 'o': 1}
  Distinct chars: 2
  Valid window, length=2, max_so_far=2

Right=2, char='l', window='kol'
  Char count: {'k': 1, 'o': 1, 'l': 1}
  Distinct chars: 3
  Valid window, length=3, max_so_far=3

Right=3, char='k', window='kolk'
  Char count: {'k': 2, 'o': 1, 'l': 1}
  Distinct chars: 3
  Valid window, length=4, max_so_far=4

Right=4, char='a', window='kolka'
  Char count: {'k': 2, 'o': 1, 'l': 1, 'a': 1}
  Distinct chars: 4
  Too many distinct chars, shrinking window...
    Removed 'k', new window='olka'

Right=5, char='t', window='olkat'
  Char count: {'k': 1, 'o': 1, 'l': 1, 'a': 1, 't': 1}
  Distinct chars: 5
  Too many distinct chars, shrinking window...
    Removed 'o', new window='lkat'
    Removed 'l', new window='kat'

Right=6, char='a', window='kata'
  Char count: {'k': 1, 'a': 2, 't': 1}
  Distinct chars: 3
  Valid window, length=4, max_so_far=4

Final answer: 4

Time and Space Complexity

  • Time Complexity: O(n), where n is the length of string. Each character is visited at most twice.

  • Space Complexity: O(min(n, k)), for storing character frequencies in the hash map.

Conclusion

The sliding window technique efficiently solves this problem in O(n) time. We maintain a valid window with at most k distinct characters using two pointers and a frequency map.

Updated on: 2026-03-25T13:37:03+05:30

824 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements