Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Program to enclose pattern into bold tag in Python?
When working with text processing, you often need to highlight specific patterns by wrapping them in HTML tags. This problem involves finding all occurrences of given patterns in a text and enclosing them in <b> tags, while merging overlapping or adjacent patterns.
Problem Understanding
Given a text string and a list of patterns, we need to:
- Find all substrings that match any pattern
- Wrap matching substrings in
<b>and</b>tags - Merge overlapping or adjacent bold regions
Algorithm Steps
The solution uses a boolean array to track which characters should be bold:
- Create a boolean array
boldof the same length as text - For each position in text, check if any pattern starts at that position
- Mark all characters of matching patterns as bold
- Build the result string by adding
<b>tags at the start and</b>tags at the end of bold regions
Implementation
class Solution:
def solve(self, text, patterns):
n = len(text)
bold = [False] * n
# Mark characters that should be bold
for i in range(n):
for pattern in patterns:
if text[i:].startswith(pattern):
for j in range(len(pattern)):
bold[i + j] = True
# Build result string with bold tags
result = ""
for i in range(n):
# Start bold tag if this is the beginning of a bold region
if bold[i] and (i == 0 or not bold[i - 1]):
result += "<b>"
result += text[i]
# End bold tag if this is the end of a bold region
if bold[i] and (i == n - 1 or not bold[i + 1]):
result += "</b>"
return result
# Test the solution
solution = Solution()
text = "thisissampleline"
patterns = ["this", "ssam", "sample"]
print(solution.solve(text, patterns))
<b>this</b>i<b>ssample</b>line
How It Works
Let's trace through the example with text "thisissampleline" and patterns ["this", "ssam", "sample"]:
- Pattern "this": Found at index 0, marks positions 0-3 as bold
- Pattern "ssam": Found at index 3, marks positions 3-6 as bold
- Pattern "sample": Found at index 6, marks positions 6-11 as bold
The bold array becomes: [True, True, True, True, True, True, True, True, True, True, True, True, False, False, False, False]
Since positions 3-11 are all marked as bold (overlapping patterns), they merge into one continuous bold region.
Alternative Approach Using String Replacement
def embolden_text(text, patterns):
n = len(text)
bold = [False] * n
# Mark all matching positions
for pattern in patterns:
start = 0
while True:
pos = text.find(pattern, start)
if pos == -1:
break
for i in range(pos, pos + len(pattern)):
bold[i] = True
start = pos + 1
# Build result
result = ""
for i in range(n):
if bold[i] and (i == 0 or not bold[i - 1]):
result += "<b>"
result += text[i]
if bold[i] and (i == n - 1 or not bold[i + 1]):
result += "</b>"
return result
# Test the alternative approach
text = "abcdefghijk"
patterns = ["abc", "def"]
print(embolden_text(text, patterns))
<b>abcdef</b>ghijk
Key Points
- The algorithm handles overlapping patterns by merging them into continuous bold regions
- Time complexity is O(n × m × p) where n is text length, m is number of patterns, and p is average pattern length
- Space complexity is O(n) for the boolean array
- The
startswith()method efficiently checks if a pattern begins at a specific position
Conclusion
This solution efficiently identifies and merges overlapping text patterns using a boolean array to track bold regions. The approach ensures that adjacent or overlapping matches are combined into single bold tags, creating clean HTML output.
---