Article Categories

Selected Reading

Program to count number of similar substrings for each query in Python

Python Server Side Programming Programming

Finding similar substrings based on character pattern matching is a common string processing problem. Two substrings are similar if they have the same length and maintain the same character relationship pattern. This means if characters at positions i and j are equal in one substring, they must be equal in the other substring too.

Understanding Similar Substrings

Two strings are similar if they follow these rules ?

They are of same length
For each pair of indices (i, j), if s[i] is same as s[j], then it must satisfy t[i] = t[j], and similarly if s[i] is not same as s[j], then t[i] and t[j] must be different.

Example

For the string s = "hjhhbcbk" and queries Q = [(1,2), (2,4)] ?

Query (1,2): substring "hj" has similar substrings: "hj", "jh", "hb", "bc", "cb", "bk" (6 total)
Query (2,4): substring "jhh" has similar substring: "jhh" (1 total)

Algorithm Implementation

The solution uses fingerprinting to optimize comparisons for longer substrings ?

fp = []

def calc_fingerprint(s):
    char_map = {s[0]: 0}
    fingerprint = "0"
    j = 1
    for i in range(1, len(s)):
        if s[i] not in char_map:
            char_map[s[i]], j = j, j + 1
        fingerprint += str(char_map[s[i]])
    return int(fingerprint)

def solve(s, Q):
    global fp
    fp = []
    
    # Generate fingerprints for optimization
    if len(s) > 10:
        for i in range(0, len(s) - 10):
            fp.append(calc_fingerprint(s[i:i + 10]))
    
    results = []
    
    for query in Q:
        a, b = query
        s1 = s[a - 1:b]  # Query substring (1-indexed to 0-indexed)
        count = 0
        
        # Check all possible substrings of same length
        for i in range(len(s) - (b - a)):
            # Optimization for longer substrings
            if b - a > 9 and len(fp) > 0 and fp[a - 1] != fp[i]:
                continue
                
            char_map = {}
            s2 = s[i:i + (b - a) + 1]
            
            # Check if s1 and s2 are similar
            is_similar = True
            for j in range(b - a + 1):
                if s2[j] not in char_map:
                    if s1[j] in char_map.values():
                        is_similar = False
                        break
                    char_map[s2[j]] = s1[j]
                elif char_map[s2[j]] != s1[j]:
                    is_similar = False
                    break
            
            if is_similar:
                count += 1
                
        results.append(count)
    
    return results

# Test the solution
s = "hjhhbcbk"
Q = [(1, 2), (2, 4)]
result = solve(s, Q)
print("Query results:", result)

Query results: [6, 1]

How It Works

The algorithm uses two key techniques ?

Fingerprinting: Creates a numeric pattern for substrings longer than 10 characters to quickly eliminate non−matching candidates
Character Mapping: For each potential match, builds a mapping between characters to verify if the pattern relationship holds

Step-by-Step Process

For each query substring ?

# Example: checking if "hj" and "bc" are similar
s1 = "hj"
s2 = "bc"

char_map = {}
is_similar = True

for i in range(len(s1)):
    if s2[i] not in char_map:
        if s1[i] in char_map.values():
            is_similar = False
            break
        char_map[s2[i]] = s1[i]
    elif char_map[s2[i]] != s1[i]:
        is_similar = False
        break

print(f"'{s1}' and '{s2}' similar: {is_similar}")
print("Character mapping:", char_map)

'hj' and 'bc' similar: True
Character mapping: {'b': 'h', 'c': 'j'}

Conclusion

This algorithm efficiently finds similar substrings by using character pattern mapping and fingerprinting for optimization. The approach handles both short and long substring queries with appropriate performance optimizations.

---

Arnab Chakraborty

Updated on: 2026-03-26T15:07:23+05:30

517 Views

Previous Next