Python – N sized substrings with K distinct characters

When working with strings, you might need to find all substrings of a specific length that contain exactly K distinct characters. This can be achieved by iterating through the string and using Python's set() method to count unique characters in each substring.

Syntax

The general approach involves:

for i in range(len(string) - n + 1):
    substring = string[i:i+n]
    if len(set(substring)) == k:
        # Add to result

Example

Below is a demonstration that finds all 2-character substrings with exactly 2 distinct characters ?

my_string = 'Pythonisfun'
print("The string is :")
print(my_string)

my_substring = 2
my_chars = 2
my_result = []

for idx in range(0, len(my_string) - my_substring + 1):
   if (len(set(my_string[idx: idx + my_substring])) == my_chars):
      my_result.append(my_string[idx: idx + my_substring])

print("The resultant string is :")
print(my_result)
The string is :
Pythonisfun
The resultant string is :
['Py', 'yt', 'th', 'ho', 'on', 'ni', 'is', 'sf', 'fu', 'un']

How It Works

  • The algorithm iterates through each possible starting position in the string

  • For each position, it extracts a substring of length N

  • The set() function removes duplicate characters, so len(set(substring)) gives the count of distinct characters

  • If this count equals K, the substring is added to the result list

Different Example

Finding 3-character substrings with exactly 3 distinct characters ?

text = "programming"
n = 3  # substring length
k = 3  # distinct characters required

result = []
for i in range(len(text) - n + 1):
    substring = text[i:i+n]
    if len(set(substring)) == k:
        result.append(substring)

print(f"String: {text}")
print(f"3-character substrings with 3 distinct characters: {result}")
String: programming
3-character substrings with 3 distinct characters: ['pro', 'rog', 'ogr', 'gra', 'ram', 'amm', 'mmi', 'min', 'ing']

Key Points

  • The set() function automatically handles duplicate character removal

  • The range calculation len(string) - n + 1 ensures we don't go beyond string bounds

  • Time complexity is O(n*m) where n is string length and m is substring length

Conclusion

This approach efficiently finds all N-sized substrings containing exactly K distinct characters by combining string slicing with set operations. The set() method provides an elegant way to count unique characters in each substring.

Updated on: 2026-03-26T02:43:17+05:30

329 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements