Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Smallest Subsequence of Distinct Characters in Python
Finding the lexicographically smallest subsequence of distinct characters is a classic problem that can be solved using a greedy approach with a stack. Given a string, we need to find the smallest subsequence that contains all unique characters exactly once.
For example, if the input is "cdadabcc", the output should be "adbc".
Algorithm Overview
The approach uses a stack to build the result and two dictionaries to track character positions and inclusion status ?
- last_occurrence: Stores the last position of each character
- in_stack: Tracks whether a character is already in the result stack
- stack: Builds the lexicographically smallest subsequence
Implementation
def smallest_subsequence(text):
"""
Find lexicographically smallest subsequence with all distinct characters
"""
stack = []
last_occurrence = {}
in_stack = {}
# Find last occurrence of each character
for i in range(len(text) - 1, -1, -1):
if text[i] not in last_occurrence:
last_occurrence[text[i]] = i
in_stack[text[i]] = False
print("Last occurrences:", last_occurrence)
# Process each character
for i, char in enumerate(text):
print(f"Processing: stack={stack}, index={i}, char='{char}'")
# Skip if character is already in stack
if in_stack[char]:
continue
# Remove larger characters that can appear later
while (stack and
stack[-1] > char and
last_occurrence[stack[-1]] > i):
removed_char = stack.pop()
in_stack[removed_char] = False
# Add current character to stack
stack.append(char)
in_stack[char] = True
return "".join(stack)
# Test the function
text = "cdadabcc"
result = smallest_subsequence(text)
print(f"\nInput: '{text}'")
print(f"Output: '{result}'")
Last occurrences: {'c': 7, 'b': 6, 'a': 4, 'd': 2}
Processing: stack=[], index=0, char='c'
Processing: stack=['c'], index=1, char='d'
Processing: stack=['c', 'd'], index=2, char='a'
Processing: stack=['a'], index=3, char='d'
Processing: stack=['a', 'd'], index=4, char='a'
Processing: stack=['a', 'd'], index=5, char='b'
Processing: stack=['a', 'd', 'b'], index=6, char='c'
Processing: stack=['a', 'd', 'b', 'c'], index=7, char='c'
Input: 'cdadabcc'
Output: 'adbc'
How It Works
The algorithm works in two phases ?
- Preprocessing: Record the last occurrence of each character
-
Stack Building: For each character:
- Skip if already in the result
- Remove larger characters from stack if they can appear later
- Add current character to stack
Example with Different Input
def test_multiple_cases():
test_cases = ["bcabc", "cbacdcbc", "ecbacba"]
for text in test_cases:
result = smallest_subsequence(text)
print(f"Input: '{text}' ? Output: '{result}'")
test_multiple_cases()
Last occurrences: {'c': 4, 'b': 3, 'a': 2}
Input: 'bcabc' ? Output: 'abc'
Last occurrences: {'c': 7, 'b': 6, 'd': 4, 'a': 5}
Input: 'cbacdcbc' ? Output: 'acdb'
Last occurrences: {'a': 6, 'b': 5, 'c': 4, 'e': 0}
Input: 'ecbacba' ? Output: 'eacb'
Key Points
- Time Complexity: O(n) where n is the length of the string
- Space Complexity: O(k) where k is the number of unique characters
- The stack ensures lexicographically smallest order
- Greedy removal of larger characters when they can appear later
Conclusion
This greedy stack-based approach efficiently finds the lexicographically smallest subsequence containing all distinct characters. The key insight is removing larger characters from the stack when they have future occurrences, ensuring the smallest possible result.
