Article Categories

Selected Reading

Program to perform prefix compression from two strings in Python

Python Server Side Programming Programming

Suppose we have two strings s and t (both contain lowercase English letters). We need to find a list of three pairs, where each pair is in the form (l, k) where k is a string and l is its length. The three pairs represent: the longest common prefix of both strings, the remaining part of string s, and the remaining part of string t.

So, if the input is like s = "science" and t = "school", then the output will be [(2, 'sc'), (5, 'ience'), (4, 'hool')]

Algorithm

To solve this, we will follow these steps −

Initialize lcp as an empty string
Iterate from 0 to minimum of length of s or length of t
- If s[i] is same as t[i], add s[i] to lcp
- Otherwise, break the loop
Extract remaining part of s from index (length of lcp) to end
Extract remaining part of t from index (length of lcp) to end
Return a list of three pairs: [(length of lcp, lcp), (length of s_rem, s_rem), (length of t_rem, t_rem)]

Example

Let us see the following implementation to get better understanding −

def solve(s, t):
    lcp = ''
    for i in range(min(len(s), len(t))):
        if s[i] == t[i]:
            lcp += s[i]
        else:
            break
    
    s_rem = s[len(lcp):]
    t_rem = t[len(lcp):]
    return [(len(lcp), lcp), (len(s_rem), s_rem), (len(t_rem), t_rem)]

s = "science"
t = "school"
print(solve(s, t))

The output of the above code is −

[(2, 'sc'), (5, 'ience'), (4, 'hool')]

How It Works

The function compares characters at the same positions in both strings. When it finds the first mismatch, it stops and considers everything before that position as the longest common prefix. Then it extracts the remaining parts of both strings after removing the common prefix.

Another Example

Let's test with different strings to see how the algorithm works −

def solve(s, t):
    lcp = ''
    for i in range(min(len(s), len(t))):
        if s[i] == t[i]:
            lcp += s[i]
        else:
            break
    
    s_rem = s[len(lcp):]
    t_rem = t[len(lcp):]
    return [(len(lcp), lcp), (len(s_rem), s_rem), (len(t_rem), t_rem)]

# Test with strings having no common prefix
s1 = "hello"
t1 = "world"
print("No common prefix:", solve(s1, t1))

# Test with one string being prefix of another
s2 = "test"
t2 = "testing"
print("One is prefix:", solve(s2, t2))

The output of the above code is −

No common prefix: [(0, ''), (5, 'hello'), (5, 'world')]
One is prefix: [(4, 'test'), (0, ''), (3, 'ing')]

Conclusion

This prefix compression algorithm efficiently finds the longest common prefix between two strings and returns the compressed representation as three pairs. The solution has O(min(m,n)) time complexity where m and n are the lengths of the input strings.

Arnab Chakraborty

Updated on: 2026-03-26T15:42:49+05:30

435 Views

Previous Next