Article Categories

Selected Reading

Program to find out the substrings of given strings at given positions in a set of all possible substrings in python

Python Server Side Programming Programming

When working with multiple strings, we often need to find specific substrings from the union of all possible substrings. This involves generating all substrings from each string, creating a sorted union set, and retrieving elements at specified positions.

So, if the input strings are ['pqr', 'pqt'] and queries are [4, 7, 9], then the output will be ['pqt', 'qt', 't'].

How It Works

The substrings from the first string are: {p, pq, pqr, q, qr, r}
The substrings from the second string are: {p, pq, pqt, q, qt, t}

The union of these sets is: {p, pq, pqr, pqt, q, qr, qt, r, t}

After lexicographically sorting: ['p', 'pq', 'pqr', 'pqt', 'q', 'qr', 'qt', 'r', 't']

The items at positions 4, 7, and 9 (1-indexed) are 'pqt', 'qt', and 't' respectively.

Algorithm Steps

To solve this problem efficiently ?

Generate all unique substrings from the input strings
Sort them lexicographically
Calculate common prefix lengths between consecutive substrings
Use binary search logic to find substrings at specified positions

Implementation

def find_substring_at_position(suffixes, prefix_lengths, position):
    """Find substring at given position using suffix array approach"""
    combined_data = zip(suffixes, prefix_lengths)
    low = high = 0
    
    for suffix, prefix_len in combined_data:
        if prefix_len is None:
            prefix_len = 0
        
        # Add count of substrings from current suffix
        high += len(suffix) - prefix_len
        
        if high - 1 == position:
            return suffix
        elif high - 1 > position:
            # Find exact substring within current suffix
            for p, q in enumerate(range(prefix_len, len(suffix))):
                if low + p == position:
                    return suffix[:q+1]
        
        low = high
    
    return None

def calculate_common_prefix(str1, str2):
    """Calculate length of common prefix between two strings"""
    min_length = min(len(str1), len(str2))
    count = 0
    
    for i in range(min_length):
        if str1[i] == str2[i]:
            count += 1
        else:
            break
    
    return count

def solve_substring_queries(strings, queries):
    """Main function to solve substring position queries"""
    # Generate all unique substrings
    substring_dict = {}
    suffixes = []
    
    for string in strings:
        for i in range(len(string)):
            substring = string[i:]
            if substring not in substring_dict:
                substring_dict[substring] = 1
                suffixes.append(substring)
    
    # Sort suffixes lexicographically
    suffixes.sort()
    
    # Calculate prefix lengths
    prefix_lengths = []
    for i in range(len(suffixes)):
        if i == 0:
            prefix_lengths.append(None)
        else:
            prefix_lengths.append(calculate_common_prefix(suffixes[i-1], suffixes[i]))
    
    # Process queries
    results = []
    for query in queries:
        result = find_substring_at_position(suffixes, prefix_lengths, query - 1)
        results.append(result)
    
    return results

# Test the solution
strings = ['pqr', 'pqt']
queries = [4, 7, 9]
result = solve_substring_queries(strings, queries)
print(result)

['pqt', 'qt', 't']

Example Walkthrough

Let's trace through the example step by step ?

# Generate all substrings
strings = ['pqr', 'pqt']

print("Substrings from 'pqr':", ['pqr', 'qr', 'r'])
print("Substrings from 'pqt':", ['pqt', 'qt', 't'])

# Combined unique substrings
all_substrings = ['p', 'pq', 'pqr', 'pqt', 'q', 'qr', 'qt', 'r', 't']
print("Sorted union:", all_substrings)

# Query positions (1-indexed)
for i, query in enumerate([4, 7, 9]):
    print(f"Position {query}: '{all_substrings[query-1]}'")

Substrings from 'pqr': ['pqr', 'qr', 'r']
Substrings from 'pqt': ['pqt', 'qt', 't']
Sorted union: ['p', 'pq', 'pqr', 'pqt', 'q', 'qr', 'qt', 'r', 't']
Position 4: 'pqt'
Position 7: 'qt'
Position 9: 't'

Key Points

The algorithm uses suffix arrays for efficient substring generation
Common prefix calculation helps optimize the search process
All positions are 1-indexed as per the problem requirement
The solution handles duplicate substrings by using a dictionary

Conclusion

This approach efficiently finds substrings at specific positions by generating a sorted union of all substrings and using optimized search techniques. The time complexity depends on the total number of unique substrings and query processing.

Arnab Chakraborty

Updated on: 2026-03-26T15:25:48+05:30

296 Views

Previous Next