Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python Program to split string into k sized overlapping strings
Splitting a string into smaller overlapping parts is a common task in text processing, data analysis, and pattern recognition. In this tutorial, we'll explore how to write a Python program that splits a given string into k-sized overlapping strings.
Understanding the Problem
We need to create overlapping substrings of fixed size k from a given string. For example, if we have the string "Hello" and k=3, we want to generate: "Hel", "ell", "llo".
Each substring has length k and starts one position after the previous substring, creating an overlap of k-1 characters.
Basic Implementation
Here's a simple function to split a string into k-sized overlapping substrings ?
def split_into_overlapping_strings(input_string, k):
overlapping_strings = []
# Iterate through the string, stopping when we can't form a complete substring
for i in range(len(input_string) - k + 1):
substring = input_string[i:i+k]
overlapping_strings.append(substring)
return overlapping_strings
# Test the function
input_string = "Hello, world!"
k = 3
result = split_into_overlapping_strings(input_string, k)
print("Original string:", input_string)
print("K value:", k)
print("Overlapping strings:", result)
Original string: Hello, world! K value: 3 Overlapping strings: ['Hel', 'ell', 'llo', 'lo,', 'o, ', ', w', ' wo', 'wor', 'orl', 'rld', 'ld!']
How It Works
The algorithm uses a simple loop with the following logic:
Range calculation: We iterate from 0 to
len(input_string) - k + 1to ensure we don't go beyond the string lengthSubstring extraction: At each position
i, we extract a substring fromitoi+kStorage: Each substring is appended to the result list
Enhanced Version with Error Handling
Let's create a more robust version that handles edge cases ?
def split_overlapping_enhanced(input_string, k):
# Input validation
if not isinstance(input_string, str):
raise TypeError("Input must be a string")
if not isinstance(k, int) or k <= 0:
raise ValueError("k must be a positive integer")
if len(input_string) < k:
print(f"Warning: String length ({len(input_string)}) is less than k ({k})")
return []
# Generate overlapping strings
result = []
for i in range(len(input_string) - k + 1):
result.append(input_string[i:i+k])
return result
# Test with different scenarios
test_cases = [
("Python Programming", 4),
("AI", 3), # String shorter than k
("Data", 2),
]
for string, k_val in test_cases:
print(f"\nInput: '{string}', k={k_val}")
try:
result = split_overlapping_enhanced(string, k_val)
print(f"Result: {result}")
except (TypeError, ValueError) as e:
print(f"Error: {e}")
Input: 'Python Programming', k=4 Result: ['Pyth', 'ytho', 'thon', 'hon ', 'on P', 'n Pr', ' Pro', 'Prog', 'rogr', 'ogra', 'gram', 'ramm', 'ammi', 'mmin', 'ming'] Input: 'AI', k=3 Warning: String length (2) is less than k (3) Result: [] Input: 'Data', k=2 Result: ['Da', 'at', 'ta']
Using List Comprehension
We can make the code more concise using list comprehension ?
def split_overlapping_compact(input_string, k):
return [input_string[i:i+k] for i in range(len(input_string) - k + 1)]
# Example usage
text = "Machine Learning"
k = 5
overlapping_parts = split_overlapping_compact(text, k)
print(f"Text: '{text}'")
print(f"K-size: {k}")
print(f"Overlapping strings: {overlapping_parts}")
print(f"Total parts: {len(overlapping_parts)}")
Text: 'Machine Learning' K-size: 5 Overlapping strings: ['Machi', 'achin', 'chine', 'hine ', 'ine L', 'ne Le', 'e Lea', ' Lear', 'Learn', 'earni', 'arnin', 'rning'] Total parts: 12
Practical Applications
This technique is useful in various scenarios ?
# Example: Analyzing character patterns in text
def analyze_patterns(text, pattern_size):
patterns = split_overlapping_compact(text, pattern_size)
# Count frequency of each pattern
pattern_count = {}
for pattern in patterns:
pattern_count[pattern] = pattern_count.get(pattern, 0) + 1
return pattern_count
# Analyze 3-character patterns
sample_text = "programming"
patterns = analyze_patterns(sample_text, 3)
print(f"Text: '{sample_text}'")
print("3-character pattern frequencies:")
for pattern, count in sorted(patterns.items()):
print(f" '{pattern}': {count}")
Text: 'programming' 3-character pattern frequencies: 'amm': 1 'gra': 1 'ing': 1 'min': 1 'mmn': 1 'ogr': 1 'pro': 1 'ram': 1 'rog': 1
Conclusion
Splitting strings into k-sized overlapping parts is achieved by iterating through the string and extracting substrings of fixed length. The key is using the correct range to avoid index errors and handling edge cases appropriately.
