Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
Python Pandas - Return Index with duplicate values removed except the first occurrence
To return a Pandas Index with duplicate values removed except the first occurrence, use the index.drop_duplicates() method with the keep parameter set to 'first'.
Basic Syntax
The drop_duplicates() method syntax is ?
index.drop_duplicates(keep='first')
Creating an Index with Duplicates
Let's create a Pandas Index containing duplicate values ?
import pandas as pd
# Creating the index with some duplicates
index = pd.Index(['Car', 'Bike', 'Airplane', 'Ship', 'Airplane'])
print("Original Index with duplicates:")
print(index)
Original Index with duplicates: Index(['Car', 'Bike', 'Airplane', 'Ship', 'Airplane'], dtype='object')
Removing Duplicates (Keep First)
Use drop_duplicates(keep='first') to keep only the first occurrence of each duplicate ?
import pandas as pd
index = pd.Index(['Car', 'Bike', 'Airplane', 'Ship', 'Airplane'])
print("Original Index:")
print(index)
# Remove duplicates keeping first occurrence
result = index.drop_duplicates(keep='first')
print("\nIndex with duplicates removed:")
print(result)
Original Index: Index(['Car', 'Bike', 'Airplane', 'Ship', 'Airplane'], dtype='object') Index with duplicates removed: Index(['Car', 'Bike', 'Airplane', 'Ship'], dtype='object')
Different Keep Options
The keep parameter accepts different values for handling duplicates ?
import pandas as pd
index = pd.Index(['A', 'B', 'C', 'B', 'A', 'D'])
print("Original Index:")
print(index)
# Keep first occurrence
first = index.drop_duplicates(keep='first')
print("\nKeep first:")
print(first)
# Keep last occurrence
last = index.drop_duplicates(keep='last')
print("\nKeep last:")
print(last)
# Remove all duplicates
none = index.drop_duplicates(keep=False)
print("\nKeep none (remove all duplicates):")
print(none)
Original Index: Index(['A', 'B', 'C', 'B', 'A', 'D'], dtype='object') Keep first: Index(['A', 'B', 'C', 'D'], dtype='object') Keep last: Index(['C', 'B', 'A', 'D'], dtype='object') Keep none (remove all duplicates): Index(['C', 'D'], dtype='object')
Comparison Table
| keep Parameter | Behavior | Use Case |
|---|---|---|
'first' |
Keep first occurrence | Default behavior, maintains original order |
'last' |
Keep last occurrence | When latest value is more relevant |
False |
Remove all duplicates | Keep only unique values |
Conclusion
Use index.drop_duplicates(keep='first') to remove duplicate values while preserving the first occurrence. The method maintains the original data type and order of the remaining elements.
Advertisements
