Program to find total similarities of a string and its substrings in Python

In Python, finding the total similarities of a string with all its suffixes is a string processing problem that can be efficiently solved using the Z-algorithm. The similarity between two strings is defined as the length of the longest common prefix.

For example, if we have the string "pqpqpp", its suffixes are "pqpqpp", "qpqpp", "pqpp", "qpp", "pp", and "p". The similarities with the original string are 6, 0, 3, 0, 1, and 1 respectively, giving a total sum of 11.

Understanding the Z-Algorithm

The Z-algorithm efficiently computes the longest common prefix between a string and each of its suffixes. It maintains a "Z-box" (defined by left and right pointers) to avoid redundant comparisons ?

Implementation

def find_total_similarities(s):
    length = len(s)
    total = length  # First suffix is the string itself
    
    z = [0]  # Z-array to store prefix lengths
    l = 0    # Left boundary of Z-box
    r = 0    # Right boundary of Z-box
    
    for k in range(1, length):
        if k > r:
            # Case 1: k is outside the current Z-box
            match = 0
            index = k
            
            while index < length:
                if s[index] == s[match]:
                    match += 1
                    index += 1
                else:
                    break
            
            z.append(match)
            if match > 0:
                total += match
                l = k
                r = index - 1
        else:
            # Case 2: k is inside the current Z-box
            if z[k - l] < (r - k) + 1:
                z.append(z[k - l])
                total += z[k - l]
            else:
                # Need to extend beyond the Z-box
                match = r - k + 1
                index = r + 1
                
                while index < length:
                    if s[index] == s[match]:
                        match += 1
                        index += 1
                    else:
                        break
                
                z.append(match)
                total += match
                l = k
                r = index - 1
    
    return total

# Test the function
s = "pqpqpp"
result = find_total_similarities(s)
print(f"String: {s}")
print(f"Total similarities: {result}")
String: pqpqpp
Total similarities: 11

Step-by-Step Breakdown

Let's trace through the algorithm with "pqpqpp" ?

def find_similarities_with_trace(s):
    length = len(s)
    total = length
    z = [length]  # First element represents the whole string
    
    print(f"String: {s}")
    print(f"Suffixes and their similarities:")
    print(f"'{s}' (position 0): similarity = {length}")
    
    l, r = 0, 0
    
    for k in range(1, length):
        similarity = 0
        suffix = s[k:]
        
        # Calculate similarity manually for demonstration
        for i in range(min(len(s), len(suffix))):
            if s[i] == suffix[i]:
                similarity += 1
            else:
                break
        
        print(f"'{suffix}' (position {k}): similarity = {similarity}")
        total += similarity
        z.append(similarity)
    
    print(f"\nTotal sum of similarities: {total}")
    return total

# Demonstrate with example
s = "pqpqpp"
find_similarities_with_trace(s)
String: pqpqpp
Suffixes and their similarities:
'pqpqpp' (position 0): similarity = 6
'qpqpp' (position 1): similarity = 0
'pqpp' (position 2): similarity = 3
'qpp' (position 3): similarity = 0
'pp' (position 4): similarity = 1
'p' (position 5): similarity = 1

Total sum of similarities: 11

Time Complexity

The Z-algorithm runs in O(n) time complexity, where n is the length of the string. This is much more efficient than the naive approach which would take O(n²) time.

Conclusion

The Z-algorithm provides an efficient solution for finding total similarities between a string and its suffixes. It uses the concept of a Z-box to avoid redundant character comparisons, achieving linear time complexity.

Updated on: 2026-03-26T14:39:57+05:30

411 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements