Article Categories

Selected Reading

Smallest Subsequence of Distinct Characters in Python

Python Server Side Programming Programming

Finding the lexicographically smallest subsequence of distinct characters is a classic problem that can be solved using a greedy approach with a stack. Given a string, we need to find the smallest subsequence that contains all unique characters exactly once.

For example, if the input is "cdadabcc", the output should be "adbc".

Algorithm Overview

The approach uses a stack to build the result and two dictionaries to track character positions and inclusion status ?

last_occurrence: Stores the last position of each character
in_stack: Tracks whether a character is already in the result stack
stack: Builds the lexicographically smallest subsequence

Implementation

def smallest_subsequence(text):
    """
    Find lexicographically smallest subsequence with all distinct characters
    """
    stack = []
    last_occurrence = {}
    in_stack = {}
    
    # Find last occurrence of each character
    for i in range(len(text) - 1, -1, -1):
        if text[i] not in last_occurrence:
            last_occurrence[text[i]] = i
            in_stack[text[i]] = False
    
    print("Last occurrences:", last_occurrence)
    
    # Process each character
    for i, char in enumerate(text):
        print(f"Processing: stack={stack}, index={i}, char='{char}'")
        
        # Skip if character is already in stack
        if in_stack[char]:
            continue
            
        # Remove larger characters that can appear later
        while (stack and 
               stack[-1] > char and 
               last_occurrence[stack[-1]] > i):
            removed_char = stack.pop()
            in_stack[removed_char] = False
        
        # Add current character to stack
        stack.append(char)
        in_stack[char] = True
    
    return "".join(stack)

# Test the function
text = "cdadabcc"
result = smallest_subsequence(text)
print(f"\nInput: '{text}'")
print(f"Output: '{result}'")

Last occurrences: {'c': 7, 'b': 6, 'a': 4, 'd': 2}
Processing: stack=[], index=0, char='c'
Processing: stack=['c'], index=1, char='d'
Processing: stack=['c', 'd'], index=2, char='a'
Processing: stack=['a'], index=3, char='d'
Processing: stack=['a', 'd'], index=4, char='a'
Processing: stack=['a', 'd'], index=5, char='b'
Processing: stack=['a', 'd', 'b'], index=6, char='c'
Processing: stack=['a', 'd', 'b', 'c'], index=7, char='c'

Input: 'cdadabcc'
Output: 'adbc'

How It Works

The algorithm works in two phases ?

Preprocessing: Record the last occurrence of each character
Stack Building: For each character:
- Skip if already in the result
- Remove larger characters from stack if they can appear later
- Add current character to stack

Example with Different Input

def test_multiple_cases():
    test_cases = ["bcabc", "cbacdcbc", "ecbacba"]
    
    for text in test_cases:
        result = smallest_subsequence(text)
        print(f"Input: '{text}' ? Output: '{result}'")

test_multiple_cases()

Last occurrences: {'c': 4, 'b': 3, 'a': 2}
Input: 'bcabc' ? Output: 'abc'
Last occurrences: {'c': 7, 'b': 6, 'd': 4, 'a': 5}
Input: 'cbacdcbc' ? Output: 'acdb'
Last occurrences: {'a': 6, 'b': 5, 'c': 4, 'e': 0}
Input: 'ecbacba' ? Output: 'eacb'

Key Points

Time Complexity: O(n) where n is the length of the string
Space Complexity: O(k) where k is the number of unique characters
The stack ensures lexicographically smallest order
Greedy removal of larger characters when they can appear later

Conclusion

This greedy stack-based approach efficiently finds the lexicographically smallest subsequence containing all distinct characters. The key insight is removing larger characters from the stack when they have future occurrences, ensuring the smallest possible result.

Arnab Chakraborty

Updated on: 2026-03-25T08:15:31+05:30

325 Views

Previous Next