Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Find the longest sub-string which is prefix, suffix and also present inside the string in Python
Given a string, we need to find the longest substring that appears as a prefix, suffix, and also somewhere inside the string. This problem uses the concept of the Longest Prefix Suffix (LPS) array from the KMP algorithm.
For example, in the string "languagepythonlanguageinterestinglanguage", the substring "language" appears at the beginning, end, and middle of the string.
Algorithm Overview
We use a two-step approach:
Build the LPS (Longest Prefix Suffix) array using the KMP preprocessing algorithm
Find the longest substring that satisfies our conditions using the LPS array
Building the LPS Array
The LPS array stores the length of the longest proper prefix which is also a suffix for each position in the string ?
def get_lps(string):
n = len(string)
long_pref_suff = [0] * n
size = 0
i = 1
while i < n:
if string[i] == string[size]:
size += 1
long_pref_suff[i] = size
i += 1
else:
if size != 0:
size = long_pref_suff[size - 1]
else:
long_pref_suff[i] = 0
i += 1
return long_pref_suff
# Test the LPS function
test_string = "languagepythonlanguageinterestinglanguage"
lps_array = get_lps(test_string)
print("String:", test_string)
print("LPS Array:", lps_array)
String: languagepythonlanguageinterestinglanguage LPS Array: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8]
Finding the Longest Substring
Now we use the LPS array to find the longest substring that appears as prefix, suffix, and also inside the string ?
def get_longest_substr(string):
long_pref_suff = get_lps(string)
n = len(string)
# If no proper prefix-suffix exists
if long_pref_suff[n - 1] == 0:
return -1
# Check if the longest prefix-suffix also appears inside
for i in range(n - 1):
if long_pref_suff[i] == long_pref_suff[n - 1]:
return string[0:long_pref_suff[i]]
# Check for shorter prefix-suffix patterns
if long_pref_suff[long_pref_suff[n - 1] - 1] == 0:
return -1
else:
return string[0:long_pref_suff[long_pref_suff[n - 1] - 1]]
# Test with the example
string = "languagepythonlanguageinterestinglanguage"
result = get_longest_substr(string)
print(f"Input: {string}")
print(f"Output: {result}")
Input: languagepythonlanguageinterestinglanguage Output: language
Testing with Different Cases
Let's test the function with various input cases ?
def get_lps(string):
n = len(string)
long_pref_suff = [0] * n
size = 0
i = 1
while i < n:
if string[i] == string[size]:
size += 1
long_pref_suff[i] = size
i += 1
else:
if size != 0:
size = long_pref_suff[size - 1]
else:
long_pref_suff[i] = 0
i += 1
return long_pref_suff
def get_longest_substr(string):
long_pref_suff = get_lps(string)
n = len(string)
if long_pref_suff[n - 1] == 0:
return -1
for i in range(n - 1):
if long_pref_suff[i] == long_pref_suff[n - 1]:
return string[0:long_pref_suff[i]]
if long_pref_suff[long_pref_suff[n - 1] - 1] == 0:
return -1
else:
return string[0:long_pref_suff[long_pref_suff[n - 1] - 1]]
# Test cases
test_cases = [
"languagepythonlanguageinterestinglanguage",
"abcab",
"ababa",
"hello"
]
for test in test_cases:
result = get_longest_substr(test)
print(f"'{test}' ? {result}")
'languagepythonlanguageinterestinglanguage' ? language 'abcab' ? -1 'ababa' ? a 'hello' ? -1
How It Works
The algorithm works in two phases:
LPS Array Construction: Uses the KMP preprocessing to find the longest proper prefix that is also a suffix for each position
Pattern Matching: Searches for positions where the LPS value equals the final LPS value, indicating the pattern appears inside the string
Time Complexity
The time complexity is O(n) where n is the length of the string, as both the LPS construction and the search phase run in linear time.
Conclusion
This solution efficiently finds the longest substring that appears as prefix, suffix, and inside the string using the KMP algorithm's LPS array. The approach handles edge cases and returns -1 when no such substring exists.
