Program to find number of sublists that contains exactly k different words in Python

Finding sublists that contain exactly k different words is a common problem in data analysis and text processing. We can solve this using the sliding window technique with a helper function approach.

Problem Understanding

Given a list of words and a value k, we need to count all sublists (contiguous subarrays) that contain exactly k distinct words. For example, with words = ["Kolkata", "Delhi", "Delhi", "Kolkata"] and k = 2, we find 5 sublists with exactly 2 unique words ?

  • ["Kolkata", "Delhi"]
  • ["Delhi", "Kolkata"]
  • ["Kolkata", "Delhi", "Delhi"]
  • ["Delhi", "Delhi", "Kolkata"]
  • ["Kolkata", "Delhi", "Delhi", "Kolkata"]

Algorithm Approach

We use a helper function that counts sublists with at most k distinct words. Then we calculate: sublists with exactly k = sublists with at most k − sublists with at most (k−1).

Helper Function Logic

  • Use sliding window with two pointers (left and right)
  • Expand window by moving right pointer
  • Contract window when distinct words exceed k
  • Count all valid sublists ending at current position

Implementation

class Solution:
    def solve(self, words, k):
        return self.count_at_most_k(words, k) - self.count_at_most_k(words, k - 1)
    
    def count_at_most_k(self, words, k):
        n = len(words)
        if k == 0:
            return 0
        
        word_count = {}
        result = 0
        left = 0
        
        for right in range(n):
            word = words[right]
            
            # Add current word to window
            if word not in word_count:
                word_count[word] = 0
            word_count[word] += 1
            
            # Shrink window if too many distinct words
            while len(word_count) > k:
                word_count[words[left]] -= 1
                if word_count[words[left]] == 0:
                    del word_count[words[left]]
                left += 1
            
            # Count sublists ending at current position
            result += right - left + 1
        
        return result

# Test the solution
ob = Solution()
words = ["Kolkata", "Delhi", "Delhi", "Kolkata"]
k = 2
print(ob.solve(words, k))
5

How It Works

The sliding window maintains at most k distinct words. For each position, we count how many valid sublists end at that position. The formula (right − left + 1) gives us the count of all sublists ending at the current right position.

Example Walkthrough

# Step-by-step execution for words = ["Kolkata", "Delhi", "Delhi", "Kolkata"], k = 2
words = ["Kolkata", "Delhi", "Delhi", "Kolkata"]

# At most 2 distinct words: 9 sublists
# At most 1 distinct word: 4 sublists  
# Exactly 2 distinct words: 9 - 4 = 5 sublists

def demonstrate_counting():
    words = ["Kolkata", "Delhi", "Delhi", "Kolkata"]
    
    print("All sublists with exactly 2 distinct words:")
    count = 0
    n = len(words)
    
    for i in range(n):
        for j in range(i, n):
            sublist = words[i:j+1]
            unique_words = set(sublist)
            if len(unique_words) == 2:
                print(f"{sublist} - {unique_words}")
                count += 1
    
    print(f"\nTotal count: {count}")

demonstrate_counting()
All sublists with exactly 2 distinct words:
['Kolkata', 'Delhi'] - {'Delhi', 'Kolkata'}
['Kolkata', 'Delhi', 'Delhi'] - {'Delhi', 'Kolkata'}
['Kolkata', 'Delhi', 'Delhi', 'Kolkata'] - {'Delhi', 'Kolkata'}
['Delhi', 'Kolkata'] - {'Delhi', 'Kolkata'}
['Delhi', 'Delhi', 'Kolkata'] - {'Delhi', 'Kolkata'}

Total count: 5

Time and Space Complexity

Aspect Complexity Explanation
Time O(n) Each element visited at most twice
Space O(k) Dictionary stores at most k distinct words

Conclusion

The sliding window approach efficiently counts sublists with exactly k distinct words by using the difference between "at most k" and "at most k−1" counts. This technique has O(n) time complexity and is optimal for this problem.

Updated on: 2026-03-25T13:28:56+05:30

178 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements