Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python Grouping similar substrings in list
In this tutorial, we are going to write a program that groups similar substrings from a list. We'll use Python's itertools.groupby() method to group strings that share a common prefix.
Problem Understanding
Given a list of strings with a common pattern (prefix-suffix), we want to group them by their prefix.
Input
strings = ['tutorials-python', 'tutorials-c', 'tutorials-java', 'tutorials-javascript', 'python-1', 'python-2', 'javascript-1']
Expected Output
[['tutorials-python', 'tutorials-c', 'tutorials-java', 'tutorials-javascript'], ['python-1', 'python-2'], ['javascript-1']]
Solution Using itertools.groupby()
The itertools.groupby() method groups consecutive elements that have the same key. We'll use a lambda function to extract the prefix (part before the hyphen) as the grouping key.
Step-by-Step Approach
- Import the
itertoolsmodule - Initialize the list of strings
- Use
itertools.groupby()with a lambda function that splits each string and returns the first part - Convert each group to a list and collect all groups
Complete Example
import itertools
# initializing the strings
strings = ['tutorials-python', 'tutorials-c', 'tutorials-java', 'tutorials-javascript', 'python-1', 'python-2', 'javascript-1']
# empty list to store results
result = []
# groupby with lambda function to extract prefix
iterator = itertools.groupby(strings, lambda string: string.split('-')[0])
# iterating over the grouped results
for element, group in iterator:
# converting group to list and appending to result
result.append(list(group))
# printing the result
print(result)
[['tutorials-python', 'tutorials-c', 'tutorials-java', 'tutorials-javascript'], ['python-1', 'python-2'], ['javascript-1']]
How It Works
The groupby() function groups consecutive elements with the same key. Our lambda function lambda string: string.split('-')[0] extracts the prefix before the hyphen, so strings with the same prefix get grouped together.
Alternative Approach Using Dictionary
Here's another way to achieve the same result using a dictionary ?
strings = ['tutorials-python', 'tutorials-c', 'tutorials-java', 'tutorials-javascript', 'python-1', 'python-2', 'javascript-1']
# using dictionary to group strings
groups = {}
for string in strings:
prefix = string.split('-')[0]
if prefix not in groups:
groups[prefix] = []
groups[prefix].append(string)
# convert dictionary values to list
result = list(groups.values())
print(result)
[['tutorials-python', 'tutorials-c', 'tutorials-java', 'tutorials-javascript'], ['python-1', 'python-2'], ['javascript-1']]
Conclusion
Use itertools.groupby() for grouping consecutive similar elements efficiently. For non-consecutive grouping, a dictionary approach works better. Both methods effectively group strings by their common prefixes.
