Most Common Word - Problem

You're building a word frequency analyzer for text processing! Given a paragraph of text and a list of banned words, your task is to find the most frequently occurring word that isn't on the banned list.

The challenge involves:

  • Cleaning the text by removing punctuation and converting to lowercase
  • Counting word frequencies while ignoring banned words
  • Returning the word with the highest count

For example, in the paragraph "Bob hit a ball, the hit BALL flew far after it was hit." with banned words ["hit"], the word "ball" appears twice and is the most frequent non-banned word.

Note: It's guaranteed that there's at least one non-banned word, and the answer is unique.

Input & Output

example_1.py โ€” Basic Case
$ Input: paragraph = "Bob hit a ball, the hit BALL flew far after it was hit.", banned = ["hit"]
โ€บ Output: "ball"
๐Ÿ’ก Note: After removing punctuation and converting to lowercase, we have words: ["bob", "hit", "a", "ball", "the", "hit", "ball", "flew", "far", "after", "it", "was", "hit"]. Excluding "hit" (banned), "ball" appears 2 times, which is the maximum frequency.
example_2.py โ€” Multiple Banned Words
$ Input: paragraph = "a, a, a, a, b,b,b,c, c", banned = ["a"]
โ€บ Output: "b"
๐Ÿ’ก Note: After cleaning: ["a", "a", "a", "a", "b", "b", "b", "c", "c"]. Excluding "a" (banned), "b" appears 3 times and "c" appears 2 times. So "b" is the most frequent.
example_3.py โ€” Case Insensitive
$ Input: paragraph = "Bob. hIt, baLl", banned = ["bob", "hit"]
โ€บ Output: "ball"
๐Ÿ’ก Note: After converting to lowercase and removing punctuation: ["bob", "hit", "ball"]. Both "bob" and "hit" are banned, leaving only "ball" as the valid answer.

Constraints

  • 1 โ‰ค paragraph.length โ‰ค 1000
  • paragraph consists of English letters, space ' ', or one of the symbols: "!?',;."
  • 0 โ‰ค banned.length โ‰ค 100
  • 1 โ‰ค banned[i].length โ‰ค 10
  • banned[i] consists of only lowercase English letters
  • There is at least one word in paragraph that is not banned
  • The answer is unique

Visualization

Tap to expand
๐ŸŽฏ Word Frequency AnalysisInput Text Processing"Bob hit a ball, the hit BALL flew far!"๐Ÿ“ Cleaning1. Remove punctuation2. Convert to lowercase3. Split into wordsResult:["bob", "hit", "a", "ball","the", "hit", "ball", "flew","far"]๐Ÿ—ณ๏ธ CountingHash Table Frequencies:bob: 1ball: 2a: 1the: 1flew: 1far: 1โŒ "hit" banned๐Ÿ† Result"ball"Frequency: 2Most frequent non-banned word!โšก Algorithm EfficiencyTime: O(n) - Single pass | Space: O(n) - Hash table for unique words๐Ÿ’ก Key Insight: Hash tables enable O(1) lookups for real-time frequency tracking!
Understanding the Visualization
1
Clean the Data
Remove punctuation and convert to lowercase, like standardizing ballot formats
2
Count Valid Votes
Use hash table to count each valid (non-banned) word, like tallying votes in real-time
3
Track the Leader
Keep track of the word with highest count as we process, like updating election results live
4
Declare Winner
Return the most frequent valid word, like announcing the election winner
Key Takeaway
๐ŸŽฏ Key Insight: Use a hash table to count frequencies in a single pass - this transforms an O(nยฒ) nested loop problem into an optimal O(n) solution by leveraging O(1) average-case hash table operations.
Asked in
Amazon 45 Google 32 Meta 28 Microsoft 22
98.0K Views
Medium Frequency
~15 min Avg. Time
2.1K Likes
Ln 1, Col 1
Smart Actions
๐Ÿ’ก Explanation
AI Ready
๐Ÿ’ก Suggestion Tab to accept Esc to dismiss
// Output will appear here after running code
Code Editor Closed
Click the red button to reopen