Duplicate Emails - Problem

Imagine you're working as a database administrator for a large company, and you need to identify duplicate email addresses in the employee database to clean up data inconsistencies.

You have a Person table with the following structure:

Column NameType
idint
emailvarchar

Where:

  • id is the primary key (unique identifier)
  • email contains the email addresses (all lowercase, never NULL)

Your task: Write a SQL query to find all email addresses that appear more than once in the table.

Example:

Input table:

idemail
1a@b.com
2c@d.com
3a@b.com

Expected output:

Email
a@b.com

Input & Output

example_1.sql โ€” Basic Duplicates
$ Input: Person table: | id | email | | 1 | a@b.com | | 2 | c@d.com | | 3 | a@b.com |
โ€บ Output: | Email | | a@b.com |
๐Ÿ’ก Note: a@b.com appears twice (id 1 and 3), so it's a duplicate. c@d.com appears only once, so it's not included.
example_2.sql โ€” Multiple Duplicates
$ Input: Person table: | id | email | | 1 | x@y.com | | 2 | a@b.com | | 3 | x@y.com | | 4 | c@d.com | | 5 | a@b.com | | 6 | x@y.com |
โ€บ Output: | Email | | x@y.com | | a@b.com |
๐Ÿ’ก Note: x@y.com appears 3 times and a@b.com appears 2 times, both are duplicates. c@d.com appears only once.
example_3.sql โ€” No Duplicates
$ Input: Person table: | id | email | | 1 | john@x.com| | 2 | jane@y.com| | 3 | bob@z.com |
โ€บ Output: (empty result)
๐Ÿ’ก Note: Each email appears exactly once, so there are no duplicates to report.

Constraints

  • 1 โ‰ค table size โ‰ค 1000
  • Email field is never NULL
  • All emails are in lowercase
  • id is the primary key (always unique)
  • Return result in any order

Visualization

Tap to expand
Email Deduplication VisualizationRaw Data1: a@b.com2: c@d.com3: a@b.comGROUP BY emaila@b.com: [1,3]c@d.com: [2]COUNT(*)a@b.com: 2c@d.com: 1HAVING > 1a@b.com: 2 โœ“c@d.com: 1 โœ—Final Resulta@b.comSQL Query BreakdownSELECT email as EmailFROM PersonGROUP BY emailHAVING COUNT(*) > 1;โ† Choose the email columnโ† From the Person tableโ† Group identical emails togetherโ† Keep only groups with >1 member
Understanding the Visualization
1
Collect all emails
Gather all email addresses from the Person table
2
Group identical emails
Use GROUP BY to automatically sort emails into groups
3
Count group sizes
COUNT(*) tells us how many emails in each group
4
Filter large groups
HAVING COUNT(*) > 1 keeps only groups with duplicates
5
Extract email addresses
SELECT email returns the duplicate email addresses
Key Takeaway
๐ŸŽฏ Key Insight: GROUP BY automatically handles the grouping and counting logic, while HAVING filters the results based on aggregate conditions - perfect for finding duplicates efficiently!
Asked in
Amazon 45 Google 38 Meta 32 Microsoft 28
42.0K Views
High Frequency
~15 min Avg. Time
1.9K Likes
Ln 1, Col 1
Smart Actions
๐Ÿ’ก Explanation
AI Ready
๐Ÿ’ก Suggestion Tab to accept Esc to dismiss
// Output will appear here after running code
Code Editor Closed
Click the red button to reopen