Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Program to find number of sublists that contains exactly k different words in Python
Finding sublists that contain exactly k different words is a common problem in data analysis and text processing. We can solve this using the sliding window technique with a helper function approach.
Problem Understanding
Given a list of words and a value k, we need to count all sublists (contiguous subarrays) that contain exactly k distinct words. For example, with words = ["Kolkata", "Delhi", "Delhi", "Kolkata"] and k = 2, we find 5 sublists with exactly 2 unique words ?
- ["Kolkata", "Delhi"]
- ["Delhi", "Kolkata"]
- ["Kolkata", "Delhi", "Delhi"]
- ["Delhi", "Delhi", "Kolkata"]
- ["Kolkata", "Delhi", "Delhi", "Kolkata"]
Algorithm Approach
We use a helper function that counts sublists with at most k distinct words. Then we calculate: sublists with exactly k = sublists with at most k â sublists with at most (kâ1).
Helper Function Logic
- Use sliding window with two pointers (left and right)
- Expand window by moving right pointer
- Contract window when distinct words exceed k
- Count all valid sublists ending at current position
Implementation
class Solution:
def solve(self, words, k):
return self.count_at_most_k(words, k) - self.count_at_most_k(words, k - 1)
def count_at_most_k(self, words, k):
n = len(words)
if k == 0:
return 0
word_count = {}
result = 0
left = 0
for right in range(n):
word = words[right]
# Add current word to window
if word not in word_count:
word_count[word] = 0
word_count[word] += 1
# Shrink window if too many distinct words
while len(word_count) > k:
word_count[words[left]] -= 1
if word_count[words[left]] == 0:
del word_count[words[left]]
left += 1
# Count sublists ending at current position
result += right - left + 1
return result
# Test the solution
ob = Solution()
words = ["Kolkata", "Delhi", "Delhi", "Kolkata"]
k = 2
print(ob.solve(words, k))
5
How It Works
The sliding window maintains at most k distinct words. For each position, we count how many valid sublists end at that position. The formula (right â left + 1) gives us the count of all sublists ending at the current right position.
Example Walkthrough
# Step-by-step execution for words = ["Kolkata", "Delhi", "Delhi", "Kolkata"], k = 2
words = ["Kolkata", "Delhi", "Delhi", "Kolkata"]
# At most 2 distinct words: 9 sublists
# At most 1 distinct word: 4 sublists
# Exactly 2 distinct words: 9 - 4 = 5 sublists
def demonstrate_counting():
words = ["Kolkata", "Delhi", "Delhi", "Kolkata"]
print("All sublists with exactly 2 distinct words:")
count = 0
n = len(words)
for i in range(n):
for j in range(i, n):
sublist = words[i:j+1]
unique_words = set(sublist)
if len(unique_words) == 2:
print(f"{sublist} - {unique_words}")
count += 1
print(f"\nTotal count: {count}")
demonstrate_counting()
All sublists with exactly 2 distinct words:
['Kolkata', 'Delhi'] - {'Delhi', 'Kolkata'}
['Kolkata', 'Delhi', 'Delhi'] - {'Delhi', 'Kolkata'}
['Kolkata', 'Delhi', 'Delhi', 'Kolkata'] - {'Delhi', 'Kolkata'}
['Delhi', 'Kolkata'] - {'Delhi', 'Kolkata'}
['Delhi', 'Delhi', 'Kolkata'] - {'Delhi', 'Kolkata'}
Total count: 5
Time and Space Complexity
| Aspect | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Each element visited at most twice |
| Space | O(k) | Dictionary stores at most k distinct words |
Conclusion
The sliding window approach efficiently counts sublists with exactly k distinct words by using the difference between "at most k" and "at most kâ1" counts. This technique has O(n) time complexity and is optimal for this problem.
