Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Program to find out the similarity between a string and its suffixes in python
In this problem, we need to find the similarity between a string and all its suffixes. The similarity is defined as the length of the longest common prefix between the original string and each suffix. We then sum up all these similarities.
For example, if the string is 'abcd', the suffixes are 'abcd', 'bcd', 'cd', 'd'. We compare each suffix with the original string to find how many characters match from the beginning.
Understanding the Problem
Let's see how this works with the string 'tpotp' ?
Original string: 'tpotp' Suffixes and their similarities: 'tpotp' ? matches 5 characters with 'tpotp' ? similarity = 5 'potp' ? matches 0 characters with 'tpotp' ? similarity = 0 'otp' ? matches 0 characters with 'tpotp' ? similarity = 0 'tp' ? matches 2 characters with 'tpotp' ? similarity = 2 'p' ? matches 0 characters with 'tpotp' ? similarity = 0 Sum of similarities = 5 + 0 + 0 + 2 + 0 = 7
Simple Approach
The straightforward method is to generate all suffixes and compare each with the original string ?
def find_similarity_simple(input_str):
total_similarity = 0
n = len(input_str)
# Generate all suffixes and calculate similarity
for i in range(n):
suffix = input_str[i:]
similarity = 0
# Find longest common prefix
for j in range(min(len(input_str), len(suffix))):
if input_str[j] == suffix[j]:
similarity += 1
else:
break
total_similarity += similarity
print(f"'{suffix}' ? similarity {similarity}")
return total_similarity
result = find_similarity_simple('tpotp')
print(f"\nTotal similarity: {result}")
'tpotp' ? similarity 5 'potp' ? similarity 0 'otp' ? similarity 0 'tp' ? similarity 2 'p' ? similarity 0 Total similarity: 7
Optimized Z-Algorithm Approach
For better efficiency with larger strings, we can use the Z-algorithm which computes similarity values in linear time ?
def find_similarity_optimized(input_str):
n = len(input_str)
z_array = [n] # First element is always the length of string
left = 0
right = 0
for i in range(1, n):
if i <= right:
# We're inside a Z-box, use previously computed values
z_array.append(min(right - i + 1, z_array[i - left]))
else:
z_array.append(0)
# Try to extend the match
while (i + z_array[i] < n and
input_str[z_array[i]] == input_str[i + z_array[i]]):
z_array[i] += 1
# Update Z-box if we found a longer match
if i + z_array[i] - 1 > right:
left = i
right = i + z_array[i] - 1
return sum(z_array)
result = find_similarity_optimized('tpotp')
print(f"Total similarity: {result}")
Total similarity: 7
Testing with Different Examples
def test_similarity(input_str):
print(f"String: '{input_str}'")
result = find_similarity_simple(input_str)
print(f"Total similarity: {result}\n")
# Test with different strings
test_similarity('abc')
test_similarity('aaaa')
test_similarity('ababa')
String: 'abc' 'abc' ? similarity 3 'bc' ? similarity 0 'c' ? similarity 0 Total similarity: 3 String: 'aaaa' 'aaaa' ? similarity 4 'aaa' ? similarity 3 'aa' ? similarity 2 'a' ? similarity 1 Total similarity: 10 String: 'ababa' 'ababa' ? similarity 5 'baba' ? similarity 0 'aba' ? similarity 3 'ba' ? similarity 0 'a' ? similarity 1 Total similarity: 9
Comparison
| Approach | Time Complexity | Space Complexity | Best For |
|---|---|---|---|
| Simple Method | O(n²) | O(1) | Small strings, easy to understand |
| Z-Algorithm | O(n) | O(n) | Large strings, optimal performance |
Conclusion
The similarity between a string and its suffixes is the sum of longest common prefixes. The simple approach works well for understanding, while the Z-algorithm provides optimal O(n) time complexity for larger inputs.
