Count Occurrences in Text

Count Occurrences in Text - Problem

Database Medium

You are given a table Files containing file names and their text content.

Table Structure:

file_name (varchar): Unique file identifier
content (text): The text content of the file

Write a SQL query to find the number of files that contain the words 'bull' and 'bear' as standalone words. The words must be surrounded by spaces or be at the beginning/end of the content.

Important: Words like 'bullet', 'bears', 'bull.', or 'bear,' should NOT be counted as they are not standalone occurrences.

Return the result showing each word ('bull' and 'bear') along with the count of files containing that word.

Table Schema

Files

Column Name	Type	Description
`file_name` PK	varchar	Unique identifier for each file
`content`	text	Text content of the file

Primary Key: file_name

Note: Each row represents one file with its complete text content

Input & Output

Example 1 — Mixed Word Matches

Input Table:

file_name	content
doc1.txt	The bull market is trending upward
doc2.txt	A bear was spotted in the woods
doc3.txt	The bullet train is very fast
doc4.txt	Bull and bear are both animals

Output:

word	n_files
bear	2
bull	2

💡 Note:

Analysis: Files doc1.txt and doc4.txt contain standalone 'bull'. Files doc2.txt and doc4.txt contain standalone 'bear'. File doc3.txt contains 'bullet' which is not counted as it's not a standalone 'bull'.

Example 2 — Edge Cases with Punctuation

Input Table:

file_name	content
file1.txt	bull.
file2.txt	bear!
file3.txt	bears are dangerous
file4.txt	no matches here

Output:

word	n_files
bear	1
bull	1

💡 Note:

Word Boundaries: 'bull.' and 'bear!' are counted as standalone words since punctuation marks serve as word boundaries. 'bears' is not counted as it contains additional letters.

Constraints

1 ≤ number of files ≤ 1000
file_name contains unique values
content can be empty or contain up to 10,000 characters
Words are case-sensitive

Visualization

Tap to expand

Asked in

A Amazon 23 M Microsoft 18 G Google 15

Use regular expressions with word boundary patterns \mbull\M and \mbear\M to identify standalone words, then UNION ALL the counts for both words.

Table Schema

Files

Column Name	Type	Description
`file_name` PK	varchar	Unique identifier for each file
`content`	text	Text content of the file

Primary Key: file_name

Note: Each row represents one file with its complete text content

Common Approaches

✓ Regular Expression with Word Boundaries

⏱️ Time: O(n×m) Space: O(1)

Apply regular expressions with word boundary patterns (\b) to identify standalone occurrences of 'bull' and 'bear', then use UNION to combine results for both words.

Regular Expression with Word Boundaries — Algorithm Steps

Step 1: Use regex pattern '\bbull\b' to match standalone 'bull'
Step 2: Use regex pattern '\bbear\b' to match standalone 'bear'
Step 3: Count files for each word and UNION the results

Visualization

Tap to expand

Step-by-Step Walkthrough

Pattern Match

Apply \mbull\M and \mbear\M patterns

Count Files

Count matching files for each word

Union Results

Combine counts for both words

Code -

solution.c — C

Time & Space Complexity

Time Complexity

⏱️

O(n×m)

n files × m average content length for pattern matching

✓ Linear Growth

Space Complexity

O(1)

Constant space for counting

✓ Linear Space

28.5K Views

Medium Frequency

~12 min Avg. Time

892 Likes

Ln 1, Col 1

Smart Actions

💡 Explanation

AI Ready

💡 Suggestion Tab to accept Esc to dismiss

// Output will appear here after running code

Code Editor Closed

Click the red button to reopen

Table Schema

Input & Output

Constraints

Visualization

Related Problems

Table Schema

Common Approaches

Regular Expression with Word Boundaries — Algorithm Steps

Visualization

Code -

Time & Space Complexity

Select Compiler