Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
Python – Split Strings on Prefix Occurrence
When we need to split a list of strings based on the occurrence of a specific prefix, we can use itertools.zip_longest() to iterate through the list and look ahead to the next element. This technique groups elements into sublists whenever a prefix match is found.
Example
Below is a demonstration of splitting strings on prefix occurrence −
from itertools import zip_longest
my_list = ["hi", 'hello', 'there', "python", "object", "oriented", "object", "cool", "language", 'py', 'extension', 'bjarne']
print("The list is:")
print(my_list)
my_prefix = "python"
print("The prefix is:")
print(my_prefix)
my_result, my_temp_val = [], []
for x, y in zip_longest(my_list, my_list[1:]):
my_temp_val.append(x)
if y and y.startswith(my_prefix):
my_result.append(my_temp_val)
my_temp_val = []
my_result.append(my_temp_val)
print("The resultant is:")
print(my_result)
print("The list after sorting is:")
my_result.sort()
print(my_result)
The list is: ['hi', 'hello', 'there', 'python', 'object', 'oriented', 'object', 'cool', 'language', 'py', 'extension', 'bjarne'] The prefix is: python The resultant is: [['hi', 'hello', 'there'], ['python', 'object', 'oriented', 'object', 'cool', 'language', 'py', 'extension', 'bjarne']] The list after sorting is: [['hi', 'hello', 'there'], ['python', 'object', 'oriented', 'object', 'cool', 'language', 'py', 'extension', 'bjarne']]
How It Works
The algorithm uses zip_longest() to pair each element with the next element in the list:
- zip_longest(my_list, my_list[1:]) − Creates pairs of (current, next) elements
- my_temp_val.append(x) − Adds current element to temporary group
- y.startswith(my_prefix) − Checks if next element starts with the prefix
- my_result.append(my_temp_val) − Saves the group when prefix is found
- my_temp_val = [] − Resets temporary group for next split
Alternative Method Using Regular Iteration
Here's a simpler approach without zip_longest() −
def split_on_prefix(data, prefix):
result = []
current_group = []
for i, item in enumerate(data):
current_group.append(item)
# Check if next item starts with prefix
if i + 1 < len(data) and data[i + 1].startswith(prefix):
result.append(current_group)
current_group = []
# Add remaining items
if current_group:
result.append(current_group)
return result
words = ["hi", "hello", "there", "python", "object", "oriented", "java", "programming"]
prefix = "python"
groups = split_on_prefix(words, prefix)
print("Split groups:", groups)
Split groups: [['hi', 'hello', 'there'], ['python', 'object', 'oriented'], ['java', 'programming']]
Conclusion
Use zip_longest() to look ahead and split lists when a prefix is encountered. The algorithm groups elements into sublists based on prefix occurrence, creating clean data partitions for further processing.
Advertisements
