Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Program to find out the substrings of given strings at given positions in a set of all possible substrings in python
When working with multiple strings, we often need to find specific substrings from the union of all possible substrings. This involves generating all substrings from each string, creating a sorted union set, and retrieving elements at specified positions.
So, if the input strings are ['pqr', 'pqt'] and queries are [4, 7, 9], then the output will be ['pqt', 'qt', 't'].
How It Works
The substrings from the first string are: {p, pq, pqr, q, qr, r}
The substrings from the second string are: {p, pq, pqt, q, qt, t}
The union of these sets is: {p, pq, pqr, pqt, q, qr, qt, r, t}
After lexicographically sorting: ['p', 'pq', 'pqr', 'pqt', 'q', 'qr', 'qt', 'r', 't']
The items at positions 4, 7, and 9 (1-indexed) are 'pqt', 'qt', and 't' respectively.
Algorithm Steps
To solve this problem efficiently ?
- Generate all unique substrings from the input strings
- Sort them lexicographically
- Calculate common prefix lengths between consecutive substrings
- Use binary search logic to find substrings at specified positions
Implementation
def find_substring_at_position(suffixes, prefix_lengths, position):
"""Find substring at given position using suffix array approach"""
combined_data = zip(suffixes, prefix_lengths)
low = high = 0
for suffix, prefix_len in combined_data:
if prefix_len is None:
prefix_len = 0
# Add count of substrings from current suffix
high += len(suffix) - prefix_len
if high - 1 == position:
return suffix
elif high - 1 > position:
# Find exact substring within current suffix
for p, q in enumerate(range(prefix_len, len(suffix))):
if low + p == position:
return suffix[:q+1]
low = high
return None
def calculate_common_prefix(str1, str2):
"""Calculate length of common prefix between two strings"""
min_length = min(len(str1), len(str2))
count = 0
for i in range(min_length):
if str1[i] == str2[i]:
count += 1
else:
break
return count
def solve_substring_queries(strings, queries):
"""Main function to solve substring position queries"""
# Generate all unique substrings
substring_dict = {}
suffixes = []
for string in strings:
for i in range(len(string)):
substring = string[i:]
if substring not in substring_dict:
substring_dict[substring] = 1
suffixes.append(substring)
# Sort suffixes lexicographically
suffixes.sort()
# Calculate prefix lengths
prefix_lengths = []
for i in range(len(suffixes)):
if i == 0:
prefix_lengths.append(None)
else:
prefix_lengths.append(calculate_common_prefix(suffixes[i-1], suffixes[i]))
# Process queries
results = []
for query in queries:
result = find_substring_at_position(suffixes, prefix_lengths, query - 1)
results.append(result)
return results
# Test the solution
strings = ['pqr', 'pqt']
queries = [4, 7, 9]
result = solve_substring_queries(strings, queries)
print(result)
['pqt', 'qt', 't']
Example Walkthrough
Let's trace through the example step by step ?
# Generate all substrings
strings = ['pqr', 'pqt']
print("Substrings from 'pqr':", ['pqr', 'qr', 'r'])
print("Substrings from 'pqt':", ['pqt', 'qt', 't'])
# Combined unique substrings
all_substrings = ['p', 'pq', 'pqr', 'pqt', 'q', 'qr', 'qt', 'r', 't']
print("Sorted union:", all_substrings)
# Query positions (1-indexed)
for i, query in enumerate([4, 7, 9]):
print(f"Position {query}: '{all_substrings[query-1]}'")
Substrings from 'pqr': ['pqr', 'qr', 'r'] Substrings from 'pqt': ['pqt', 'qt', 't'] Sorted union: ['p', 'pq', 'pqr', 'pqt', 'q', 'qr', 'qt', 'r', 't'] Position 4: 'pqt' Position 7: 'qt' Position 9: 't'
Key Points
- The algorithm uses suffix arrays for efficient substring generation
- Common prefix calculation helps optimize the search process
- All positions are 1-indexed as per the problem requirement
- The solution handles duplicate substrings by using a dictionary
Conclusion
This approach efficiently finds substrings at specific positions by generating a sorted union of all substrings and using optimized search techniques. The time complexity depends on the total number of unique substrings and query processing.
