Count Occurrences in Text - Problem
Count Occurrences in Text

You're given a database table containing files and their text content. Your task is to find how many files contain specific words as standalone words (not as parts of other words).

๐ŸŽฏ Goal: Count files that contain the words 'bull' and 'bear' as complete, separate words.

๐Ÿ“ Table Schema:
Files table:
โ€ข file_name (varchar) - unique identifier for each file
โ€ข content (text) - the actual text content of the file

โš ๏ธ Important Rules:
โ€ข Words must be standalone (surrounded by spaces or at string boundaries)
โ€ข 'bullet', 'bear.', 'bears' don't count
โ€ข 'bull' at the beginning/end of content counts
โ€ข Case-sensitive matching

Return: A result set with two rows showing the word ('bull' or 'bear') and the count of files containing each word.

Input & Output

example_1.sql โ€” Basic Word Matching
$ Input: Files table: | file_name | content | | file1.txt | "The bull market is trending up" | | file2.txt | "Bear markets are scary" | | file3.txt | "Bulletin about market bear" |
โ€บ Output: | word | count_occurrences | | bull | 1 | | bear | 2 |
๐Ÿ’ก Note: file1.txt contains 'bull' as standalone word. file2.txt and file3.txt contain 'bear' as standalone word. 'Bulletin' doesn't count because 'bull' is not a separate word.
example_2.sql โ€” Edge Cases with Punctuation
$ Input: Files table: | file_name | content | | doc1.txt | "bull. The market is strong" | | doc2.txt | "Watch out for bear!" | | doc3.txt | "bulls and bears trade" |
โ€บ Output: | word | count_occurrences | | bull | 1 | | bear | 1 |
๐Ÿ’ก Note: doc1.txt: 'bull.' counts (punctuation creates word boundary). doc2.txt: 'bear!' counts. doc3.txt: 'bulls' and 'bears' don't count (not exact matches).
example_3.sql โ€” No Matches Found
$ Input: Files table: | file_name | content | | report1.txt | "The bullet train is fast" | | report2.txt | "Beard trimming tips" | | report3.txt | "Market trends analysis" |
โ€บ Output: | word | count_occurrences | | bull | 0 | | bear | 0 |
๐Ÿ’ก Note: No files contain 'bull' or 'bear' as standalone words. 'bullet' and 'beard' contain the target strings but are not exact word matches.

Visualization

Tap to expand
Database Word Boundary Matching ProcessFiles Tablefile1.txt: "The bull market is strong"file2.txt: "Bear markets decline but bulletin shows recovery"file3.txt: "Market analysis bears fruit"Query 1: \bbull\b Patternbullโœ“ file1.txt matchesbulletinโœ— file2.txt: bulletin โ‰  bullQuery 2: \bbear\b PatternBearCase-sensitive: Bear โ‰  bearbearsโœ“ file3.txt: bears matchesSQL UNION Result| word | count_occurrences || bull | 1 || bear | 1 |Key Insight: Regex Word Boundaries (\b)โ€ข Automatically handles spaces, punctuation, start/end of stringโ€ข Database-optimized pattern matching for O(n) performanceโ€ข Eliminates manual boundary checking complexity
Understanding the Visualization
1
Pattern Recognition
Regex engine compiles \bbull\b pattern to match word boundaries
2
Text Scanning
Engine scans each file's content for pattern matches
3
Boundary Validation
Automatic verification that matches are complete words
4
Count Aggregation
Database aggregates results across all matching files
Key Takeaway
๐ŸŽฏ Key Insight: Regex word boundaries (\b) provide an elegant, efficient solution that automatically handles all edge cases while leveraging database engine optimizations for maximum performance.

Time & Space Complexity

Time Complexity
โฑ๏ธ
O(n)

Linear scan through all file content with optimized regex engine

n
2n
โœ“ Linear Growth
Space Complexity
O(1)

No additional storage needed, regex engine handles pattern matching

n
2n
โœ“ Linear Space

Constraints

  • 1 โ‰ค number of files โ‰ค 104
  • 0 โ‰ค content length โ‰ค 106 characters per file
  • Word matching is case-sensitive
  • Target words must be exactly 'bull' and 'bear' (not 'Bull', 'BEAR', etc.)
  • Word boundaries are defined by non-alphanumeric characters (except underscore)
Asked in
Google 42 Amazon 38 Meta 29 Microsoft 24 Apple 18
28.4K Views
Medium Frequency
~12 min Avg. Time
856 Likes
Ln 1, Col 1
Smart Actions
๐Ÿ’ก Explanation
AI Ready
๐Ÿ’ก Suggestion Tab to accept Esc to dismiss
// Output will appear here after running code
Code Editor Closed
Click the red button to reopen