Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Program to remove duplicate characters from a given string in Python
When working with strings in Python, we often need to remove duplicate characters while preserving the order of first occurrence. This is useful in data cleaning, text processing, and algorithm problems.
We can solve this using an ordered dictionary to maintain the insertion order of characters. The dictionary tracks which characters we've seen, and we can join the keys to get our result string.
So, if the input is like s = "bbabcaaccdbaabababc", then the output will be "bacd".
Algorithm
- Create an ordered dictionary to store characters in insertion order
- For each character c in the string:
- If c is not present in dictionary, add it with initial count 0
- Increment the count for character c
- Join the dictionary keys in order to form the result string
Method 1: Using OrderedDict
The OrderedDict from collections module maintains insertion order ?
from collections import OrderedDict
def remove_duplicates(s):
d = OrderedDict()
for c in s:
if c not in d:
d[c] = 0
d[c] += 1
return ''.join(d.keys())
s = "bbabcaaccdbaabababc"
result = remove_duplicates(s)
print(f"Original: {s}")
print(f"Result: {result}")
Original: bbabcaaccdbaabababc Result: bacd
Method 2: Using Regular Dictionary (Python 3.7+)
Since Python 3.7, regular dictionaries maintain insertion order ?
def remove_duplicates_dict(s):
seen = {}
for c in s:
seen[c] = True
return ''.join(seen.keys())
s = "bbabcaaccdbaabababc"
result = remove_duplicates_dict(s)
print(f"Original: {s}")
print(f"Result: {result}")
Original: bbabcaaccdbaabababc Result: bacd
Method 3: Using Set for Tracking
A more memory-efficient approach using a set to track seen characters ?
def remove_duplicates_set(s):
seen = set()
result = []
for c in s:
if c not in seen:
seen.add(c)
result.append(c)
return ''.join(result)
s = "bbabcaaccdbaabababc"
result = remove_duplicates_set(s)
print(f"Original: {s}")
print(f"Result: {result}")
Original: bbabcaaccdbaabababc Result: bacd
Comparison
| Method | Memory Usage | Python Version | Best For |
|---|---|---|---|
| OrderedDict | Higher | All versions | Explicit ordering guarantee |
| Regular Dict | Medium | 3.7+ | Simple and clean code |
| Set + List | Lower | All versions | Memory efficiency |
Conclusion
Use regular dictionaries for Python 3.7+ projects, OrderedDict for older versions, and the set-based approach when memory efficiency is critical. All methods preserve the first occurrence order of characters.
