Count Occurrences in Text - Problem

You are given a table Files containing file names and their text content.

Table Structure:

  • file_name (varchar): Unique file identifier
  • content (text): The text content of the file

Write a SQL query to find the number of files that contain the words 'bull' and 'bear' as standalone words. The words must be surrounded by spaces or be at the beginning/end of the content.

Important: Words like 'bullet', 'bears', 'bull.', or 'bear,' should NOT be counted as they are not standalone occurrences.

Return the result showing each word ('bull' and 'bear') along with the count of files containing that word.

Table Schema

Files
Column Name Type Description
file_name PK varchar Unique identifier for each file
content text Text content of the file
Primary Key: file_name
Note: Each row represents one file with its complete text content

Input & Output

Example 1 — Mixed Word Matches
Input Table:
file_name content
doc1.txt The bull market is trending upward
doc2.txt A bear was spotted in the woods
doc3.txt The bullet train is very fast
doc4.txt Bull and bear are both animals
Output:
word n_files
bear 2
bull 2
💡 Note:

Analysis: Files doc1.txt and doc4.txt contain standalone 'bull'. Files doc2.txt and doc4.txt contain standalone 'bear'. File doc3.txt contains 'bullet' which is not counted as it's not a standalone 'bull'.

Example 2 — Edge Cases with Punctuation
Input Table:
file_name content
file1.txt bull.
file2.txt bear!
file3.txt bears are dangerous
file4.txt no matches here
Output:
word n_files
bear 1
bull 1
💡 Note:

Word Boundaries: 'bull.' and 'bear!' are counted as standalone words since punctuation marks serve as word boundaries. 'bears' is not counted as it contains additional letters.

Constraints

  • 1 ≤ number of files ≤ 1000
  • file_name contains unique values
  • content can be empty or contain up to 10,000 characters
  • Words are case-sensitive

Visualization

Tap to expand
Count Occurrences in Text INPUT file_name content draft1.txt "The bull and bear market" draft2.txt "A bullet hit the bears" final.txt "bull vs bear analysis" Search Words: bull bear Exclude: bullet, bears (partial matches) ALGORITHM STEPS 1 Add Space Padding CONCAT(' ', content, ' ') 2 Search with Boundaries LIKE '% bull %' 3 Count Matches SUM(CASE WHEN...THEN 1) 4 Return Results word, count columns Example Match Logic: " The bull and bear " bull: MATCH (spaces around) bear: MATCH (spaces around) " bullet bears " NO MATCH FINAL RESULT word count bull 2 bear 2 File Analysis: draft1.txt: bull OK, bear OK draft2.txt: NO (bullet, bears) final.txt: bull OK, bear OK Total: 2 files each Exact word matches only Key Insight: By padding the content with spaces (CONCAT(' ', content, ' ')), we ensure that searching for ' bull ' will only match standalone words. This elegantly handles both text boundaries and word boundaries, excluding partial matches like 'bullet' or 'bears' without complex regex patterns. TutorialsPoint - Count Occurrences in Text | Optimal Solution
Asked in
Amazon 23 Microsoft 18 Google 15
28.5K Views
Medium Frequency
~12 min Avg. Time
892 Likes
Ln 1, Col 1
Smart Actions
💡 Explanation
AI Ready
💡 Suggestion Tab to accept Esc to dismiss
// Output will appear here after running code
Code Editor Closed
Click the red button to reopen