Article Categories

Selected Reading

Python Program to split string into k sized overlapping strings

Python Server Side Programming Programming

Splitting a string into smaller overlapping parts is a common task in text processing, data analysis, and pattern recognition. In this tutorial, we'll explore how to write a Python program that splits a given string into k-sized overlapping strings.

Understanding the Problem

We need to create overlapping substrings of fixed size k from a given string. For example, if we have the string "Hello" and k=3, we want to generate: "Hel", "ell", "llo".

Each substring has length k and starts one position after the previous substring, creating an overlap of k-1 characters.

Basic Implementation

Here's a simple function to split a string into k-sized overlapping substrings ?

def split_into_overlapping_strings(input_string, k):
    overlapping_strings = []
    
    # Iterate through the string, stopping when we can't form a complete substring
    for i in range(len(input_string) - k + 1):
        substring = input_string[i:i+k]
        overlapping_strings.append(substring)
    
    return overlapping_strings

# Test the function
input_string = "Hello, world!"
k = 3

result = split_into_overlapping_strings(input_string, k)
print("Original string:", input_string)
print("K value:", k)
print("Overlapping strings:", result)

Original string: Hello, world!
K value: 3
Overlapping strings: ['Hel', 'ell', 'llo', 'lo,', 'o, ', ', w', ' wo', 'wor', 'orl', 'rld', 'ld!']

How It Works

The algorithm uses a simple loop with the following logic:

Range calculation: We iterate from 0 to len(input_string) - k + 1 to ensure we don't go beyond the string length
Substring extraction: At each position i, we extract a substring from i to i+k
Storage: Each substring is appended to the result list

Enhanced Version with Error Handling

Let's create a more robust version that handles edge cases ?

def split_overlapping_enhanced(input_string, k):
    # Input validation
    if not isinstance(input_string, str):
        raise TypeError("Input must be a string")
    
    if not isinstance(k, int) or k <= 0:
        raise ValueError("k must be a positive integer")
    
    if len(input_string) < k:
        print(f"Warning: String length ({len(input_string)}) is less than k ({k})")
        return []
    
    # Generate overlapping strings
    result = []
    for i in range(len(input_string) - k + 1):
        result.append(input_string[i:i+k])
    
    return result

# Test with different scenarios
test_cases = [
    ("Python Programming", 4),
    ("AI", 3),  # String shorter than k
    ("Data", 2),
]

for string, k_val in test_cases:
    print(f"\nInput: '{string}', k={k_val}")
    try:
        result = split_overlapping_enhanced(string, k_val)
        print(f"Result: {result}")
    except (TypeError, ValueError) as e:
        print(f"Error: {e}")

Input: 'Python Programming', k=4
Result: ['Pyth', 'ytho', 'thon', 'hon ', 'on P', 'n Pr', ' Pro', 'Prog', 'rogr', 'ogra', 'gram', 'ramm', 'ammi', 'mmin', 'ming']

Input: 'AI', k=3
Warning: String length (2) is less than k (3)
Result: []

Input: 'Data', k=2
Result: ['Da', 'at', 'ta']

Using List Comprehension

We can make the code more concise using list comprehension ?

def split_overlapping_compact(input_string, k):
    return [input_string[i:i+k] for i in range(len(input_string) - k + 1)]

# Example usage
text = "Machine Learning"
k = 5

overlapping_parts = split_overlapping_compact(text, k)
print(f"Text: '{text}'")
print(f"K-size: {k}")
print(f"Overlapping strings: {overlapping_parts}")
print(f"Total parts: {len(overlapping_parts)}")

Text: 'Machine Learning'
K-size: 5
Overlapping strings: ['Machi', 'achin', 'chine', 'hine ', 'ine L', 'ne Le', 'e Lea', ' Lear', 'Learn', 'earni', 'arnin', 'rning']
Total parts: 12

Practical Applications

This technique is useful in various scenarios ?

# Example: Analyzing character patterns in text
def analyze_patterns(text, pattern_size):
    patterns = split_overlapping_compact(text, pattern_size)
    
    # Count frequency of each pattern
    pattern_count = {}
    for pattern in patterns:
        pattern_count[pattern] = pattern_count.get(pattern, 0) + 1
    
    return pattern_count

# Analyze 3-character patterns
sample_text = "programming"
patterns = analyze_patterns(sample_text, 3)

print(f"Text: '{sample_text}'")
print("3-character pattern frequencies:")
for pattern, count in sorted(patterns.items()):
    print(f"  '{pattern}': {count}")

Text: 'programming'
3-character pattern frequencies:
  'amm': 1
  'gra': 1
  'ing': 1
  'min': 1
  'mmn': 1
  'ogr': 1
  'pro': 1
  'ram': 1
  'rog': 1

Conclusion

Splitting strings into k-sized overlapping parts is achieved by iterating through the string and extracting substrings of fixed length. The key is using the correct range to avoid index errors and handling edge cases appropriately.

Mrudgandha Kulkarni

Updated on: 2026-03-27T11:55:31+05:30

712 Views

Previous Next