Python Grouping similar substrings in list

In this tutorial, we are going to write a program that groups similar substrings from a list. We'll use Python's itertools.groupby() method to group strings that share a common prefix.

Problem Understanding

Given a list of strings with a common pattern (prefix-suffix), we want to group them by their prefix.

Input

strings = ['tutorials-python', 'tutorials-c', 'tutorials-java', 'tutorials-javascript', 'python-1', 'python-2', 'javascript-1']

Expected Output

[['tutorials-python', 'tutorials-c', 'tutorials-java', 'tutorials-javascript'], ['python-1', 'python-2'], ['javascript-1']]

Solution Using itertools.groupby()

The itertools.groupby() method groups consecutive elements that have the same key. We'll use a lambda function to extract the prefix (part before the hyphen) as the grouping key.

Step-by-Step Approach

  • Import the itertools module
  • Initialize the list of strings
  • Use itertools.groupby() with a lambda function that splits each string and returns the first part
  • Convert each group to a list and collect all groups

Complete Example

import itertools

# initializing the strings
strings = ['tutorials-python', 'tutorials-c', 'tutorials-java', 'tutorials-javascript', 'python-1', 'python-2', 'javascript-1']

# empty list to store results
result = []

# groupby with lambda function to extract prefix
iterator = itertools.groupby(strings, lambda string: string.split('-')[0])

# iterating over the grouped results
for element, group in iterator:
    # converting group to list and appending to result
    result.append(list(group))

# printing the result
print(result)
[['tutorials-python', 'tutorials-c', 'tutorials-java', 'tutorials-javascript'], ['python-1', 'python-2'], ['javascript-1']]

How It Works

The groupby() function groups consecutive elements with the same key. Our lambda function lambda string: string.split('-')[0] extracts the prefix before the hyphen, so strings with the same prefix get grouped together.

Alternative Approach Using Dictionary

Here's another way to achieve the same result using a dictionary ?

strings = ['tutorials-python', 'tutorials-c', 'tutorials-java', 'tutorials-javascript', 'python-1', 'python-2', 'javascript-1']

# using dictionary to group strings
groups = {}
for string in strings:
    prefix = string.split('-')[0]
    if prefix not in groups:
        groups[prefix] = []
    groups[prefix].append(string)

# convert dictionary values to list
result = list(groups.values())
print(result)
[['tutorials-python', 'tutorials-c', 'tutorials-java', 'tutorials-javascript'], ['python-1', 'python-2'], ['javascript-1']]

Conclusion

Use itertools.groupby() for grouping consecutive similar elements efficiently. For non-consecutive grouping, a dictionary approach works better. Both methods effectively group strings by their common prefixes.

Updated on: 2026-03-25T08:59:33+05:30

3K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements