Program to find out the similarity between a string and its suffixes in python

In this problem, we need to find the similarity between a string and all its suffixes. The similarity is defined as the length of the longest common prefix between the original string and each suffix. We then sum up all these similarities.

For example, if the string is 'abcd', the suffixes are 'abcd', 'bcd', 'cd', 'd'. We compare each suffix with the original string to find how many characters match from the beginning.

Understanding the Problem

Let's see how this works with the string 'tpotp' ?

Original string: 'tpotp'
Suffixes and their similarities:
'tpotp' ? matches 5 characters with 'tpotp' ? similarity = 5
'potp'  ? matches 0 characters with 'tpotp' ? similarity = 0
'otp'   ? matches 0 characters with 'tpotp' ? similarity = 0  
'tp'    ? matches 2 characters with 'tpotp' ? similarity = 2
'p'     ? matches 0 characters with 'tpotp' ? similarity = 0

Sum of similarities = 5 + 0 + 0 + 2 + 0 = 7

Simple Approach

The straightforward method is to generate all suffixes and compare each with the original string ?

def find_similarity_simple(input_str):
    total_similarity = 0
    n = len(input_str)
    
    # Generate all suffixes and calculate similarity
    for i in range(n):
        suffix = input_str[i:]
        similarity = 0
        
        # Find longest common prefix
        for j in range(min(len(input_str), len(suffix))):
            if input_str[j] == suffix[j]:
                similarity += 1
            else:
                break
        
        total_similarity += similarity
        print(f"'{suffix}' ? similarity {similarity}")
    
    return total_similarity

result = find_similarity_simple('tpotp')
print(f"\nTotal similarity: {result}")
'tpotp' ? similarity 5
'potp' ? similarity 0
'otp' ? similarity 0
'tp' ? similarity 2
'p' ? similarity 0

Total similarity: 7

Optimized Z-Algorithm Approach

For better efficiency with larger strings, we can use the Z-algorithm which computes similarity values in linear time ?

def find_similarity_optimized(input_str):
    n = len(input_str)
    z_array = [n]  # First element is always the length of string
    
    left = 0
    right = 0
    
    for i in range(1, n):
        if i <= right:
            # We're inside a Z-box, use previously computed values
            z_array.append(min(right - i + 1, z_array[i - left]))
        else:
            z_array.append(0)
        
        # Try to extend the match
        while (i + z_array[i] < n and 
               input_str[z_array[i]] == input_str[i + z_array[i]]):
            z_array[i] += 1
        
        # Update Z-box if we found a longer match
        if i + z_array[i] - 1 > right:
            left = i
            right = i + z_array[i] - 1
    
    return sum(z_array)

result = find_similarity_optimized('tpotp')
print(f"Total similarity: {result}")
Total similarity: 7

Testing with Different Examples

def test_similarity(input_str):
    print(f"String: '{input_str}'")
    result = find_similarity_simple(input_str)
    print(f"Total similarity: {result}\n")

# Test with different strings
test_similarity('abc')
test_similarity('aaaa')
test_similarity('ababa')
String: 'abc'
'abc' ? similarity 3
'bc' ? similarity 0
'c' ? similarity 0
Total similarity: 3

String: 'aaaa'
'aaaa' ? similarity 4
'aaa' ? similarity 3
'aa' ? similarity 2
'a' ? similarity 1
Total similarity: 10

String: 'ababa'
'ababa' ? similarity 5
'baba' ? similarity 0
'aba' ? similarity 3
'ba' ? similarity 0
'a' ? similarity 1
Total similarity: 9

Comparison

Approach Time Complexity Space Complexity Best For
Simple Method O(n²) O(1) Small strings, easy to understand
Z-Algorithm O(n) O(n) Large strings, optimal performance

Conclusion

The similarity between a string and its suffixes is the sum of longest common prefixes. The simple approach works well for understanding, while the Z-algorithm provides optimal O(n) time complexity for larger inputs.

Updated on: 2026-03-26T15:24:06+05:30

236 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements