Find the longest substring with k unique characters in a given string in Python


Suppose we have a string we have to return the longest possible substring that has exactly k number of unique characters, if there are more than one substring of longest possible length, return any of them.

So, if the input is like s = "ppqprqtqtqt", k = 3, then the output will be rqtqtqt as that has length 7.

To solve this, we will follow these steps −

  • N := 26

  • Define a function is_ok() . This will take count, k

  • val := 0

  • for i in range 0 to N, do

    • if count[i] > 0, then

      • val := val + 1

  • return true when (k >= val)

  • From the main method, do the following −

  • unique := 0, size := size of s

  • count := An array of size N, fill with 0

  • for i in range 0 to size, do

    • if count of s[i] is same as 0, then

      • unique := unique + 1

    • increase count of s[i] by 1

  • if unique < k, then

    • there is no such character and exit

  • start := 0, end := 0

  • window_length := 1, window_start := 0

  • count := An array of size N, fill with 0

  • increase count of s[0] by 1

  • for i in range 1 to size, do

    • increase count of s[i] by 1

    • end := end + 1

    • while is_ok(count, k) is false, do

      • decrease count of s[i] by 1

      • start := start + 1

    • if end-start+1 > window_length, then

      • window_length := end-start+1

      • window_start := start

  • return substring of s[from index window_start to window_start + window_length]

Example 

Let us see the following implementation to get better understanding −

 Live Demo

N = 26
def is_ok(count, k):
   val = 0
   for i in range(N):
      if count[i] > 0:
         val += 1
   return (k >= val)
def k_unique_chars(s, k):
   unique = 0
   size = len(s)
   count = [0] * N
   for i in range(size):
      if count[ord(s[i])-ord('a')] == 0:
         unique += 1
      count[ord(s[i])-ord('a')] += 1
   if unique < k:
      return "Not sufficient characters"
   start = 0
   end = 0
   window_length = 1
   window_start = 0
   count = [0] * len(count)
   count[ord(s[0])-ord('a')] += 1
   for i in range(1,size):
      count[ord(s[i])-ord('a')] += 1
      end+=1
      while not is_ok(count, k):
         count[ord(s[start])-ord('a')] -= 1
         start += 1
      if end-start+1 > window_length:
         window_length = end-start+1
         window_start = start
   return s[window_start:window_start + window_length]

s = "ppqprqtqtqt"
k = 3
print(k_unique_chars(s, k))

Input

"ppqprqtqtqt", 3

Output

rqtqtqt

Updated on: 20-Aug-2020

475 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements