Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Program to find total similarities of a string and its substrings in Python
In Python, finding the total similarities of a string with all its suffixes is a string processing problem that can be efficiently solved using the Z-algorithm. The similarity between two strings is defined as the length of the longest common prefix.
For example, if we have the string "pqpqpp", its suffixes are "pqpqpp", "qpqpp", "pqpp", "qpp", "pp", and "p". The similarities with the original string are 6, 0, 3, 0, 1, and 1 respectively, giving a total sum of 11.
Understanding the Z-Algorithm
The Z-algorithm efficiently computes the longest common prefix between a string and each of its suffixes. It maintains a "Z-box" (defined by left and right pointers) to avoid redundant comparisons ?
Implementation
def find_total_similarities(s):
length = len(s)
total = length # First suffix is the string itself
z = [0] # Z-array to store prefix lengths
l = 0 # Left boundary of Z-box
r = 0 # Right boundary of Z-box
for k in range(1, length):
if k > r:
# Case 1: k is outside the current Z-box
match = 0
index = k
while index < length:
if s[index] == s[match]:
match += 1
index += 1
else:
break
z.append(match)
if match > 0:
total += match
l = k
r = index - 1
else:
# Case 2: k is inside the current Z-box
if z[k - l] < (r - k) + 1:
z.append(z[k - l])
total += z[k - l]
else:
# Need to extend beyond the Z-box
match = r - k + 1
index = r + 1
while index < length:
if s[index] == s[match]:
match += 1
index += 1
else:
break
z.append(match)
total += match
l = k
r = index - 1
return total
# Test the function
s = "pqpqpp"
result = find_total_similarities(s)
print(f"String: {s}")
print(f"Total similarities: {result}")
String: pqpqpp Total similarities: 11
Step-by-Step Breakdown
Let's trace through the algorithm with "pqpqpp" ?
def find_similarities_with_trace(s):
length = len(s)
total = length
z = [length] # First element represents the whole string
print(f"String: {s}")
print(f"Suffixes and their similarities:")
print(f"'{s}' (position 0): similarity = {length}")
l, r = 0, 0
for k in range(1, length):
similarity = 0
suffix = s[k:]
# Calculate similarity manually for demonstration
for i in range(min(len(s), len(suffix))):
if s[i] == suffix[i]:
similarity += 1
else:
break
print(f"'{suffix}' (position {k}): similarity = {similarity}")
total += similarity
z.append(similarity)
print(f"\nTotal sum of similarities: {total}")
return total
# Demonstrate with example
s = "pqpqpp"
find_similarities_with_trace(s)
String: pqpqpp Suffixes and their similarities: 'pqpqpp' (position 0): similarity = 6 'qpqpp' (position 1): similarity = 0 'pqpp' (position 2): similarity = 3 'qpp' (position 3): similarity = 0 'pp' (position 4): similarity = 1 'p' (position 5): similarity = 1 Total sum of similarities: 11
Time Complexity
The Z-algorithm runs in O(n) time complexity, where n is the length of the string. This is much more efficient than the naive approach which would take O(n²) time.
Conclusion
The Z-algorithm provides an efficient solution for finding total similarities between a string and its suffixes. It uses the concept of a Z-box to avoid redundant character comparisons, achieving linear time complexity.
