Smallest Subsequence of Distinct Characters in Python

Finding the lexicographically smallest subsequence of distinct characters is a classic problem that can be solved using a greedy approach with a stack. Given a string, we need to find the smallest subsequence that contains all unique characters exactly once.

For example, if the input is "cdadabcc", the output should be "adbc".

Algorithm Overview

The approach uses a stack to build the result and two dictionaries to track character positions and inclusion status ?

  • last_occurrence: Stores the last position of each character
  • in_stack: Tracks whether a character is already in the result stack
  • stack: Builds the lexicographically smallest subsequence

Implementation

def smallest_subsequence(text):
    """
    Find lexicographically smallest subsequence with all distinct characters
    """
    stack = []
    last_occurrence = {}
    in_stack = {}
    
    # Find last occurrence of each character
    for i in range(len(text) - 1, -1, -1):
        if text[i] not in last_occurrence:
            last_occurrence[text[i]] = i
            in_stack[text[i]] = False
    
    print("Last occurrences:", last_occurrence)
    
    # Process each character
    for i, char in enumerate(text):
        print(f"Processing: stack={stack}, index={i}, char='{char}'")
        
        # Skip if character is already in stack
        if in_stack[char]:
            continue
            
        # Remove larger characters that can appear later
        while (stack and 
               stack[-1] > char and 
               last_occurrence[stack[-1]] > i):
            removed_char = stack.pop()
            in_stack[removed_char] = False
        
        # Add current character to stack
        stack.append(char)
        in_stack[char] = True
    
    return "".join(stack)

# Test the function
text = "cdadabcc"
result = smallest_subsequence(text)
print(f"\nInput: '{text}'")
print(f"Output: '{result}'")
Last occurrences: {'c': 7, 'b': 6, 'a': 4, 'd': 2}
Processing: stack=[], index=0, char='c'
Processing: stack=['c'], index=1, char='d'
Processing: stack=['c', 'd'], index=2, char='a'
Processing: stack=['a'], index=3, char='d'
Processing: stack=['a', 'd'], index=4, char='a'
Processing: stack=['a', 'd'], index=5, char='b'
Processing: stack=['a', 'd', 'b'], index=6, char='c'
Processing: stack=['a', 'd', 'b', 'c'], index=7, char='c'

Input: 'cdadabcc'
Output: 'adbc'

How It Works

The algorithm works in two phases ?

  1. Preprocessing: Record the last occurrence of each character
  2. Stack Building: For each character:
    • Skip if already in the result
    • Remove larger characters from stack if they can appear later
    • Add current character to stack

Example with Different Input

def test_multiple_cases():
    test_cases = ["bcabc", "cbacdcbc", "ecbacba"]
    
    for text in test_cases:
        result = smallest_subsequence(text)
        print(f"Input: '{text}' ? Output: '{result}'")

test_multiple_cases()
Last occurrences: {'c': 4, 'b': 3, 'a': 2}
Input: 'bcabc' ? Output: 'abc'
Last occurrences: {'c': 7, 'b': 6, 'd': 4, 'a': 5}
Input: 'cbacdcbc' ? Output: 'acdb'
Last occurrences: {'a': 6, 'b': 5, 'c': 4, 'e': 0}
Input: 'ecbacba' ? Output: 'eacb'

Key Points

  • Time Complexity: O(n) where n is the length of the string
  • Space Complexity: O(k) where k is the number of unique characters
  • The stack ensures lexicographically smallest order
  • Greedy removal of larger characters when they can appear later

Conclusion

This greedy stack-based approach efficiently finds the lexicographically smallest subsequence containing all distinct characters. The key insight is removing larger characters from the stack when they have future occurrences, ensuring the smallest possible result.

Updated on: 2026-03-25T08:15:31+05:30

263 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements