Python – Split Strings on Prefix Occurrence

When we need to split a list of strings based on the occurrence of a specific prefix, we can use itertools.zip_longest() to iterate through the list and look ahead to the next element. This technique groups elements into sublists whenever a prefix match is found.

Example

Below is a demonstration of splitting strings on prefix occurrence −

from itertools import zip_longest

my_list = ["hi", 'hello', 'there', "python", "object", "oriented", "object", "cool", "language", 'py', 'extension', 'bjarne']

print("The list is:")
print(my_list)

my_prefix = "python"
print("The prefix is:")
print(my_prefix)

my_result, my_temp_val = [], []

for x, y in zip_longest(my_list, my_list[1:]):
    my_temp_val.append(x)
    if y and y.startswith(my_prefix):
        my_result.append(my_temp_val)
        my_temp_val = []

my_result.append(my_temp_val)

print("The resultant is:")
print(my_result)

print("The list after sorting is:")
my_result.sort()
print(my_result)
The list is:
['hi', 'hello', 'there', 'python', 'object', 'oriented', 'object', 'cool', 'language', 'py', 'extension', 'bjarne']
The prefix is:
python
The resultant is:
[['hi', 'hello', 'there'], ['python', 'object', 'oriented', 'object', 'cool', 'language', 'py', 'extension', 'bjarne']]
The list after sorting is:
[['hi', 'hello', 'there'], ['python', 'object', 'oriented', 'object', 'cool', 'language', 'py', 'extension', 'bjarne']]

How It Works

The algorithm uses zip_longest() to pair each element with the next element in the list:

  • zip_longest(my_list, my_list[1:]) − Creates pairs of (current, next) elements
  • my_temp_val.append(x) − Adds current element to temporary group
  • y.startswith(my_prefix) − Checks if next element starts with the prefix
  • my_result.append(my_temp_val) − Saves the group when prefix is found
  • my_temp_val = [] − Resets temporary group for next split

Alternative Method Using Regular Iteration

Here's a simpler approach without zip_longest()

def split_on_prefix(data, prefix):
    result = []
    current_group = []
    
    for i, item in enumerate(data):
        current_group.append(item)
        
        # Check if next item starts with prefix
        if i + 1 < len(data) and data[i + 1].startswith(prefix):
            result.append(current_group)
            current_group = []
    
    # Add remaining items
    if current_group:
        result.append(current_group)
    
    return result

words = ["hi", "hello", "there", "python", "object", "oriented", "java", "programming"]
prefix = "python"

groups = split_on_prefix(words, prefix)
print("Split groups:", groups)
Split groups: [['hi', 'hello', 'there'], ['python', 'object', 'oriented'], ['java', 'programming']]

Conclusion

Use zip_longest() to look ahead and split lists when a prefix is encountered. The algorithm groups elements into sublists based on prefix occurrence, creating clean data partitions for further processing.

Updated on: 2026-03-26T01:43:44+05:30

490 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements