Count Occurrences in Text - Problem
Count Occurrences in Text
You're given a database table containing files and their text content. Your task is to find how many files contain specific words as standalone words (not as parts of other words).
๐ฏ Goal: Count files that contain the words
๐ Table Schema:
โข
โข
โ ๏ธ Important Rules:
โข Words must be standalone (surrounded by spaces or at string boundaries)
โข
โข
โข Case-sensitive matching
Return: A result set with two rows showing the word (
You're given a database table containing files and their text content. Your task is to find how many files contain specific words as standalone words (not as parts of other words).
๐ฏ Goal: Count files that contain the words
'bull' and 'bear' as complete, separate words.๐ Table Schema:
Files table:โข
file_name (varchar) - unique identifier for each fileโข
content (text) - the actual text content of the fileโ ๏ธ Important Rules:
โข Words must be standalone (surrounded by spaces or at string boundaries)
โข
'bullet', 'bear.', 'bears' don't countโข
'bull' at the beginning/end of content countsโข Case-sensitive matching
Return: A result set with two rows showing the word (
'bull' or 'bear') and the count of files containing each word. Input & Output
example_1.sql โ Basic Word Matching
$
Input:
Files table:
| file_name | content |
| file1.txt | "The bull market is trending up" |
| file2.txt | "Bear markets are scary" |
| file3.txt | "Bulletin about market bear" |
โบ
Output:
| word | count_occurrences |
| bull | 1 |
| bear | 2 |
๐ก Note:
file1.txt contains 'bull' as standalone word. file2.txt and file3.txt contain 'bear' as standalone word. 'Bulletin' doesn't count because 'bull' is not a separate word.
example_2.sql โ Edge Cases with Punctuation
$
Input:
Files table:
| file_name | content |
| doc1.txt | "bull. The market is strong" |
| doc2.txt | "Watch out for bear!" |
| doc3.txt | "bulls and bears trade" |
โบ
Output:
| word | count_occurrences |
| bull | 1 |
| bear | 1 |
๐ก Note:
doc1.txt: 'bull.' counts (punctuation creates word boundary). doc2.txt: 'bear!' counts. doc3.txt: 'bulls' and 'bears' don't count (not exact matches).
example_3.sql โ No Matches Found
$
Input:
Files table:
| file_name | content |
| report1.txt | "The bullet train is fast" |
| report2.txt | "Beard trimming tips" |
| report3.txt | "Market trends analysis" |
โบ
Output:
| word | count_occurrences |
| bull | 0 |
| bear | 0 |
๐ก Note:
No files contain 'bull' or 'bear' as standalone words. 'bullet' and 'beard' contain the target strings but are not exact word matches.
Visualization
Tap to expand
Understanding the Visualization
1
Pattern Recognition
Regex engine compiles \bbull\b pattern to match word boundaries
2
Text Scanning
Engine scans each file's content for pattern matches
3
Boundary Validation
Automatic verification that matches are complete words
4
Count Aggregation
Database aggregates results across all matching files
Key Takeaway
๐ฏ Key Insight: Regex word boundaries (\b) provide an elegant, efficient solution that automatically handles all edge cases while leveraging database engine optimizations for maximum performance.
Time & Space Complexity
Time Complexity
O(n)
Linear scan through all file content with optimized regex engine
โ Linear Growth
Space Complexity
O(1)
No additional storage needed, regex engine handles pattern matching
โ Linear Space
Constraints
- 1 โค number of files โค 104
- 0 โค content length โค 106 characters per file
- Word matching is case-sensitive
- Target words must be exactly 'bull' and 'bear' (not 'Bull', 'BEAR', etc.)
- Word boundaries are defined by non-alphanumeric characters (except underscore)
๐ก
Explanation
AI Ready
๐ก Suggestion
Tab
to accept
Esc
to dismiss
// Output will appear here after running code