Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to find longest repetitive sequence in a string in Python?
Strings are essential data types used in many real-world problems that involve analyzing and manipulating text data. In this article, we are going to learn about finding the longest repetitive sequence in a string.
The repetitive sequence refers to a substring that appears more than once in the given string. Python provides several built-in features to accomplish this task efficiently.
Using Suffix Array and LCP
A suffix array is used to store all the suffixes of the given string in lexicographic order.
In this approach, we create a list of all suffixes of the string, sort them lexicographically, then compare each adjacent pair to find the longest common prefix (LCP). The longest common prefix represents the longest repeated substring.
Example
In the following example, we find the longest repeated substring in "WELCOME" using the suffix array approach ?
def find_longest_repeated_suffix(s):
n = len(s)
# Generate all suffixes
suffixes = [s[i:] for i in range(n)]
suffixes.sort()
longest_repeated = ""
# Compare adjacent suffixes
for i in range(n - 1):
common_prefix = find_common_prefix(suffixes[i], suffixes[i + 1])
if len(common_prefix) > len(longest_repeated):
longest_repeated = common_prefix
return longest_repeated
def find_common_prefix(str1, str2):
result = ""
for char1, char2 in zip(str1, str2):
if char1 == char2:
result += char1
else:
break
return result
print(find_longest_repeated_suffix("WELCOME"))
The output of the above program is ?
E
Using Sliding Window and Set
The second approach uses a sliding window algorithm with a set. We generate all possible substrings and use a set to track substrings that have been seen. If a substring appears again and is longer than the current result, we update the result.
Example
Following example considers the input "112212213" to find the longest repeated substring ?
def find_longest_repeated_set(s):
seen = set()
n = len(s)
max_len = 0
result = ""
# Generate all substrings
for i in range(n):
for j in range(i + 1, n + 1):
substring = s[i:j]
if substring in seen and len(substring) > max_len:
result = substring
max_len = len(substring)
seen.add(substring)
return result
print(find_longest_repeated_set("112212213"))
The output of the above program is ?
1221
Using Python Dictionary
The third approach uses a dictionary to count occurrences. We generate all substrings, store their occurrence count in the dictionary, and update the result if a substring appears more than once and is longer than the current maximum.
Example
Consider the following example to find the longest repetitive sequence in "tutorialspoint" ?
from collections import defaultdict
def find_longest_repeated_dict(s):
substring_count = defaultdict(int)
n = len(s)
max_len = 0
result = ""
# Generate all substrings and count occurrences
for i in range(n):
for j in range(i + 1, n + 1):
substring = s[i:j]
substring_count[substring] += 1
# Update result if substring repeats and is longer
if substring_count[substring] > 1 and len(substring) > max_len:
max_len = len(substring)
result = substring
return result
print(find_longest_repeated_dict("tutorialspoint"))
The output of the above program is ?
t
Comparison
| Method | Time Complexity | Space Complexity | Best For |
|---|---|---|---|
| Suffix Array | O(n²log n) | O(n²) | Academic understanding |
| Sliding Window + Set | O(n³) | O(n²) | Simple implementation |
| Dictionary Count | O(n³) | O(n²) | Tracking frequencies |
Conclusion
All three approaches effectively find the longest repeated substring. The suffix array method provides better theoretical complexity, while dictionary-based counting offers intuitive implementation and frequency tracking capabilities.
