Find the longest sub-string which is prefix, suffix and also present inside the string in Python

Given a string, we need to find the longest substring that appears as a prefix, suffix, and also somewhere inside the string. This problem uses the concept of the Longest Prefix Suffix (LPS) array from the KMP algorithm.

For example, in the string "languagepythonlanguageinterestinglanguage", the substring "language" appears at the beginning, end, and middle of the string.

Algorithm Overview

We use a two-step approach:

  • Build the LPS (Longest Prefix Suffix) array using the KMP preprocessing algorithm

  • Find the longest substring that satisfies our conditions using the LPS array

Building the LPS Array

The LPS array stores the length of the longest proper prefix which is also a suffix for each position in the string ?

def get_lps(string):
    n = len(string)
    long_pref_suff = [0] * n
    size = 0
    i = 1
    
    while i < n:
        if string[i] == string[size]:
            size += 1
            long_pref_suff[i] = size
            i += 1
        else:
            if size != 0:
                size = long_pref_suff[size - 1]
            else:
                long_pref_suff[i] = 0
                i += 1
    
    return long_pref_suff

# Test the LPS function
test_string = "languagepythonlanguageinterestinglanguage"
lps_array = get_lps(test_string)
print("String:", test_string)
print("LPS Array:", lps_array)
String: languagepythonlanguageinterestinglanguage
LPS Array: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8]

Finding the Longest Substring

Now we use the LPS array to find the longest substring that appears as prefix, suffix, and also inside the string ?

def get_longest_substr(string):
    long_pref_suff = get_lps(string)
    n = len(string)
    
    # If no proper prefix-suffix exists
    if long_pref_suff[n - 1] == 0:
        return -1
    
    # Check if the longest prefix-suffix also appears inside
    for i in range(n - 1):
        if long_pref_suff[i] == long_pref_suff[n - 1]:
            return string[0:long_pref_suff[i]]
    
    # Check for shorter prefix-suffix patterns
    if long_pref_suff[long_pref_suff[n - 1] - 1] == 0:
        return -1
    else:
        return string[0:long_pref_suff[long_pref_suff[n - 1] - 1]]

# Test with the example
string = "languagepythonlanguageinterestinglanguage"
result = get_longest_substr(string)
print(f"Input: {string}")
print(f"Output: {result}")
Input: languagepythonlanguageinterestinglanguage
Output: language

Testing with Different Cases

Let's test the function with various input cases ?

def get_lps(string):
    n = len(string)
    long_pref_suff = [0] * n
    size = 0
    i = 1
    
    while i < n:
        if string[i] == string[size]:
            size += 1
            long_pref_suff[i] = size
            i += 1
        else:
            if size != 0:
                size = long_pref_suff[size - 1]
            else:
                long_pref_suff[i] = 0
                i += 1
    
    return long_pref_suff

def get_longest_substr(string):
    long_pref_suff = get_lps(string)
    n = len(string)
    
    if long_pref_suff[n - 1] == 0:
        return -1
    
    for i in range(n - 1):
        if long_pref_suff[i] == long_pref_suff[n - 1]:
            return string[0:long_pref_suff[i]]
    
    if long_pref_suff[long_pref_suff[n - 1] - 1] == 0:
        return -1
    else:
        return string[0:long_pref_suff[long_pref_suff[n - 1] - 1]]

# Test cases
test_cases = [
    "languagepythonlanguageinterestinglanguage",
    "abcab",
    "ababa",
    "hello"
]

for test in test_cases:
    result = get_longest_substr(test)
    print(f"'{test}' ? {result}")
'languagepythonlanguageinterestinglanguage' ? language
'abcab' ? -1
'ababa' ? a
'hello' ? -1

How It Works

The algorithm works in two phases:

  1. LPS Array Construction: Uses the KMP preprocessing to find the longest proper prefix that is also a suffix for each position

  2. Pattern Matching: Searches for positions where the LPS value equals the final LPS value, indicating the pattern appears inside the string

Time Complexity

The time complexity is O(n) where n is the length of the string, as both the LPS construction and the search phase run in linear time.

Conclusion

This solution efficiently finds the longest substring that appears as prefix, suffix, and inside the string using the KMP algorithm's LPS array. The approach handles edge cases and returns -1 when no such substring exists.

Updated on: 2026-03-25T09:29:49+05:30

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements