Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Program to count number of similar substrings for each query in Python
Finding similar substrings based on character pattern matching is a common string processing problem. Two substrings are similar if they have the same length and maintain the same character relationship pattern. This means if characters at positions i and j are equal in one substring, they must be equal in the other substring too.
Understanding Similar Substrings
Two strings are similar if they follow these rules ?
They are of same length
For each pair of indices (i, j), if s[i] is same as s[j], then it must satisfy t[i] = t[j], and similarly if s[i] is not same as s[j], then t[i] and t[j] must be different.
Example
For the string s = "hjhhbcbk" and queries Q = [(1,2), (2,4)] ?
- Query (1,2): substring "hj" has similar substrings: "hj", "jh", "hb", "bc", "cb", "bk" (6 total)
- Query (2,4): substring "jhh" has similar substring: "jhh" (1 total)
Algorithm Implementation
The solution uses fingerprinting to optimize comparisons for longer substrings ?
fp = []
def calc_fingerprint(s):
char_map = {s[0]: 0}
fingerprint = "0"
j = 1
for i in range(1, len(s)):
if s[i] not in char_map:
char_map[s[i]], j = j, j + 1
fingerprint += str(char_map[s[i]])
return int(fingerprint)
def solve(s, Q):
global fp
fp = []
# Generate fingerprints for optimization
if len(s) > 10:
for i in range(0, len(s) - 10):
fp.append(calc_fingerprint(s[i:i + 10]))
results = []
for query in Q:
a, b = query
s1 = s[a - 1:b] # Query substring (1-indexed to 0-indexed)
count = 0
# Check all possible substrings of same length
for i in range(len(s) - (b - a)):
# Optimization for longer substrings
if b - a > 9 and len(fp) > 0 and fp[a - 1] != fp[i]:
continue
char_map = {}
s2 = s[i:i + (b - a) + 1]
# Check if s1 and s2 are similar
is_similar = True
for j in range(b - a + 1):
if s2[j] not in char_map:
if s1[j] in char_map.values():
is_similar = False
break
char_map[s2[j]] = s1[j]
elif char_map[s2[j]] != s1[j]:
is_similar = False
break
if is_similar:
count += 1
results.append(count)
return results
# Test the solution
s = "hjhhbcbk"
Q = [(1, 2), (2, 4)]
result = solve(s, Q)
print("Query results:", result)
Query results: [6, 1]
How It Works
The algorithm uses two key techniques ?
- Fingerprinting: Creates a numeric pattern for substrings longer than 10 characters to quickly eliminate non−matching candidates
- Character Mapping: For each potential match, builds a mapping between characters to verify if the pattern relationship holds
Step-by-Step Process
For each query substring ?
# Example: checking if "hj" and "bc" are similar
s1 = "hj"
s2 = "bc"
char_map = {}
is_similar = True
for i in range(len(s1)):
if s2[i] not in char_map:
if s1[i] in char_map.values():
is_similar = False
break
char_map[s2[i]] = s1[i]
elif char_map[s2[i]] != s1[i]:
is_similar = False
break
print(f"'{s1}' and '{s2}' similar: {is_similar}")
print("Character mapping:", char_map)
'hj' and 'bc' similar: True
Character mapping: {'b': 'h', 'c': 'j'}
Conclusion
This algorithm efficiently finds similar substrings by using character pattern mapping and fingerprinting for optimization. The approach handles both short and long substring queries with appropriate performance optimizations.
---