Drop Duplicate Rows - Problem

You have a DataFrame called customers with the following structure:

Column NameType
customer_idint
nameobject
emailobject

There are some duplicate rows in the DataFrame based on the email column.

Write a solution to remove these duplicate rows and keep only the first occurrence.

Return the cleaned DataFrame in the same format.

Input & Output

Example 1 — Basic Duplicate Removal
$ Input: customers = [{"customer_id": 1, "name": "Ella", "email": "emily@example.com"}, {"customer_id": 2, "name": "David", "email": "michael@example.com"}, {"customer_id": 3, "name": "Zachary", "email": "sarah@example.com"}, {"customer_id": 4, "name": "Alice", "email": "emily@example.com"}]
Output: [{"customer_id": 1, "name": "Ella", "email": "emily@example.com"}, {"customer_id": 2, "name": "David", "email": "michael@example.com"}, {"customer_id": 3, "name": "Zachary", "email": "sarah@example.com"}]
💡 Note: Row with customer_id=4 (Alice) is removed because emily@example.com already exists in row with customer_id=1 (Ella). We keep the first occurrence.
Example 2 — Multiple Duplicates
$ Input: customers = [{"customer_id": 1, "name": "John", "email": "john@email.com"}, {"customer_id": 2, "name": "Bob", "email": "bob@email.com"}, {"customer_id": 3, "name": "Johnny", "email": "john@email.com"}, {"customer_id": 4, "name": "Robert", "email": "bob@email.com"}]
Output: [{"customer_id": 1, "name": "John", "email": "john@email.com"}, {"customer_id": 2, "name": "Bob", "email": "bob@email.com"}]
💡 Note: Both john@email.com and bob@email.com have duplicates. We keep only the first occurrence of each email: John (ID=1) and Bob (ID=2).
Example 3 — No Duplicates
$ Input: customers = [{"customer_id": 1, "name": "Alice", "email": "alice@email.com"}, {"customer_id": 2, "name": "Bob", "email": "bob@email.com"}]
Output: [{"customer_id": 1, "name": "Alice", "email": "alice@email.com"}, {"customer_id": 2, "name": "Bob", "email": "bob@email.com"}]
💡 Note: All emails are unique, so no rows are removed. The DataFrame remains unchanged.

Constraints

  • 1 ≤ customers.length ≤ 104
  • customer_id, name, and email are non-empty
  • All customer_id values are unique

Visualization

Tap to expand
Drop Duplicate Rows INPUT id name email 1 Ella emily@... 2 David michael@... 3 Zachary sarah@... 4 Alice emily@... DUPLICATE EMAIL! customers DataFrame 4 rows x 3 columns ALGORITHM STEPS 1 Identify Column Check 'email' for duplicates 2 Call drop_duplicates() subset=['email'] 3 Keep First Occurrence keep='first' (default) 4 Return Clean DataFrame Duplicates removed df.drop_duplicates( subset=['email'], keep='first' ) FINAL RESULT id name email 1 Ella emily@... 2 David michael@... 3 Zachary sarah@... Row 4 (Alice) REMOVED OK - Cleaned! 3 rows x 3 columns Unique emails preserved Key Insight: The drop_duplicates() method efficiently removes duplicate rows based on specified columns. Using subset=['email'] checks only the email column. The keep='first' parameter (default) retains the first occurrence and removes subsequent duplicates. Time complexity: O(n). TutorialsPoint - Drop Duplicate Rows | Pandas drop_duplicates()
Asked in
Netflix 25 Spotify 20 Uber 15
22.3K Views
High Frequency
~10 min Avg. Time
890 Likes
Ln 1, Col 1
Smart Actions
💡 Explanation
AI Ready
💡 Suggestion Tab to accept Esc to dismiss
// Output will appear here after running code
Code Editor Closed
Click the red button to reopen