Delete Duplicate Emails - Problem
Email Database Cleanup Challenge

You're working as a database administrator for a company that has been collecting user emails for years. Over time, the same email addresses have been registered multiple times with different IDs, creating duplicates in your Person table.

The Problem: Clean up the database by removing duplicate emails, but keep the record with the smallest ID (the oldest registration).

Table Structure:
Person
+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| id          | int     |
| email       | varchar |
+-------------+---------+

Your Task: Write a DELETE statement (not SELECT) that removes all duplicate emails, preserving only the entry with the smallest id for each unique email.

Note: This is a database manipulation problem - you're actually modifying the table structure, not just querying it!

Input & Output

example_1.sql โ€” Basic Duplicate Removal
$ Input: Person table: +----+------------------+ | id | email | +----+------------------+ | 1 | john@example.com | | 2 | bob@example.com | | 3 | john@example.com | +----+------------------+
โ€บ Output: Person table after deletion: +----+------------------+ | id | email | +----+------------------+ | 1 | john@example.com | | 2 | bob@example.com | +----+------------------+
๐Ÿ’ก Note: The row with id=3 is deleted because it's a duplicate of john@example.com with a higher ID than the existing row (id=1).
example_2.sql โ€” Multiple Duplicates
$ Input: Person table: +----+------------------+ | id | email | +----+------------------+ | 1 | alice@example.com| | 2 | bob@example.com | | 3 | alice@example.com| | 4 | charlie@ex.com | | 5 | alice@example.com| +----+------------------+
โ€บ Output: Person table after deletion: +----+------------------+ | id | email | +----+------------------+ | 1 | alice@example.com| | 2 | bob@example.com | | 4 | charlie@ex.com | +----+------------------+
๐Ÿ’ก Note: Rows with id=3 and id=5 are deleted because they are duplicates of alice@example.com. Only the record with the smallest ID (id=1) is kept.
example_3.sql โ€” No Duplicates
$ Input: Person table: +----+------------------+ | id | email | +----+------------------+ | 1 | john@example.com | | 2 | jane@example.com | | 3 | bob@example.com | +----+------------------+
โ€บ Output: Person table after deletion: +----+------------------+ | id | email | +----+------------------+ | 1 | john@example.com | | 2 | jane@example.com | | 3 | bob@example.com | +----+------------------+
๐Ÿ’ก Note: No rows are deleted because all email addresses are unique. The table remains unchanged.

Visualization

Tap to expand
Email Deduplication VisualizationBefore: Duplicate EmailsID: 1john@example.comID: 2bob@example.comID: 3john@example.comDELETEAfter: Unique Emails OnlyID: 1john@example.comID: 2bob@example.comSQL QueryDELETE p1 FROM Person p1, Person p2WHERE p1.email = p2.emailAND p1.id > p2.id;๐ŸŽฏ Key Insight: Self-join allows efficient comparison and deletion in one operation
Understanding the Visualization
1
Identify Duplicates
Find all email addresses that appear multiple times in the database
2
Compare IDs
For each duplicate email, compare the ID numbers to find the smallest (oldest)
3
Mark for Deletion
Mark all duplicate entries with larger IDs for removal
4
Execute Deletion
Remove the marked records, leaving only unique emails with smallest IDs
Key Takeaway
๐ŸŽฏ Key Insight: Using a self-join with DELETE allows us to compare each row with all other rows having the same email and remove duplicates with higher IDs in a single, efficient database operation.

Time & Space Complexity

Time Complexity
โฑ๏ธ
O(nยฒ)

Self-join operation compares every row with every other row

n
2n
โš  Quadratic Growth
Space Complexity
O(n)

Temporary space needed for join operation and result set

n
2n
โšก Linearithmic Space

Constraints

  • 1 โ‰ค Person table rows โ‰ค 104
  • All emails are in lowercase format
  • id is unique and serves as the primary key
  • Email length: 1 โ‰ค email.length โ‰ค 100
Asked in
Google 25 Amazon 18 Microsoft 15 Meta 12
28.5K Views
Medium Frequency
~15 min Avg. Time
945 Likes
Ln 1, Col 1
Smart Actions
๐Ÿ’ก Explanation
AI Ready
๐Ÿ’ก Suggestion Tab to accept Esc to dismiss
// Output will appear here after running code
Code Editor Closed
Click the red button to reopen