Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
SequenceMatcher in Python for Longest Common Substring.
The SequenceMatcher class is part of Python's difflib module. It compares sequences (such as lists or strings) and finds similarities between them.
The task is to find the Longest Common Substring ? the longest sequence of characters that appears contiguously in both strings. This is different from the Longest Common Subsequence, where characters may appear in the same order but not necessarily contiguous.
Using find_longest_match() Method
The find_longest_match() method finds the longest matching sequence of elements between two sequences. It returns a Match object with three attributes: a (start position in first sequence), b (start position in second sequence), and size (length of the match).
Syntax
SequenceMatcher.find_longest_match(alo, ahi, blo, bhi)
Parameters:
-
alo, ahi? Range in the first sequence -
blo, bhi? Range in the second sequence
Example 1: Basic String Matching
Find the longest common substring between "abcde" and "abghf" ?
from difflib import SequenceMatcher
x = "abcde"
y = "abghf"
matcher = SequenceMatcher(None, x, y)
result = matcher.find_longest_match(0, len(x), 0, len(y))
print("Result:", x[result.a : result.a + result.size])
print("Match details: start={}, size={}".format(result.a, result.size))
Result: ab Match details: start=0, size=2
Example 2: No Common Substring
When there's no common substring, the result will be an empty string ?
from difflib import SequenceMatcher
x = "xyz"
y = "efg"
matcher = SequenceMatcher(None, x, y)
result = matcher.find_longest_match(0, len(x), 0, len(y))
match = x[result.a : result.a + result.size]
print("Result: '{}'".format(match))
print("Size:", result.size)
Result: '' Size: 0
Example 3: Case-Sensitive Matching
SequenceMatcher is case-sensitive by default. Finding the longest common substring between 'Welcome' and 'weLCome' ?
from difflib import SequenceMatcher
x = "Welcome"
y = "weLCome"
matcher = SequenceMatcher(None, x, y)
result = matcher.find_longest_match(0, len(x), 0, len(y))
print("Result:", x[result.a : result.a + result.size])
print("Position in x: {}, Position in y: {}".format(result.a, result.b))
Result: ome Position in x: 4, Position in y: 4
Example 4: Practical Function
Creating a reusable function to find longest common substring ?
from difflib import SequenceMatcher
def longest_common_substring(str1, str2):
matcher = SequenceMatcher(None, str1, str2)
result = matcher.find_longest_match(0, len(str1), 0, len(str2))
return str1[result.a : result.a + result.size]
# Test with different examples
examples = [
("programming", "graming"),
("hello world", "yellow"),
("python", "java")
]
for s1, s2 in examples:
lcs = longest_common_substring(s1, s2)
print(f"'{s1}' & '{s2}' ? '{lcs}'")
'programming' & 'graming' ? 'graming' 'hello world' & 'yellow' ? 'ello' 'python' & 'java' ? ''
Key Points
| Feature | Description |
|---|---|
| Case Sensitivity | Default behavior is case-sensitive |
| Return Type | Match object with a, b, and size attributes |
| Empty Result | Returns size=0 when no common substring exists |
| Contiguous | Finds consecutive characters only |
Conclusion
SequenceMatcher's find_longest_match() method efficiently finds the longest common substring between two sequences. It's case-sensitive by default and returns detailed position information along with the match size.
