Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Duplicate substring removal from list in Python
Sometimes we may have a need to refine a given list by eliminating duplicate substrings from delimited strings. This can be achieved by using a combination of various methods available in Python's standard library like set(), split(), and list comprehensions.
Using set() and split()
The split() method segregates the elements for duplicate checking, and the set() method stores only unique elements from each string. This approach maintains grouping by original string ?
Example
# initializing list
strings = ['xy-xy', 'pq-qr', 'xp-xp-xp', 'dd-ee']
print("Given list:", strings)
# using set() and split()
result = [set(sub.split('-')) for sub in strings]
print("List after duplicate removal:", result)
The output of the above code is ?
Given list: ['xy-xy', 'pq-qr', 'xp-xp-xp', 'dd-ee']
List after duplicate removal: [{'xy'}, {'pq', 'qr'}, {'xp'}, {'ee', 'dd'}]
Using List Comprehension with Set
We can also flatten all substrings into a single list and remove duplicates globally using nested list comprehension with set() ?
Example
# initializing list
strings = ['xy-xy', 'pq-qr', 'xp-xp-xp', 'dd-ee']
print("Given list:", strings)
# using list comprehension with set
result = list({substring for string in strings for substring in string.split('-')})
print("List after duplicate removal:", result)
The output of the above code is ?
Given list: ['xy-xy', 'pq-qr', 'xp-xp-xp', 'dd-ee'] List after duplicate removal: ['dd', 'pq', 'ee', 'xp', 'xy', 'qr']
Comparison
| Method | Output Type | Preserves Groups | Best For |
|---|---|---|---|
set() per string |
List of sets | Yes | Maintaining original string grouping |
Flattened set |
Single list | No | Getting all unique substrings |
Conclusion
Use set() with list comprehension to remove duplicates per string while maintaining groups. Use flattened comprehension with set() to get all unique substrings in a single list.
