Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python Pandas - Indicate duplicate index values
The index.duplicated() method in Pandas identifies duplicate values in an index by returning a boolean array. It marks duplicate occurrences as True while keeping the first occurrence unmarked by default.
Basic Usage
Let's start by creating an index with some duplicate values ?
import pandas as pd
# Creating the index with some duplicates
index = pd.Index(['Car', 'Bike', 'Airplane', 'Ship', 'Airplane'])
# Display the index
print("Pandas Index with duplicates...")
print(index)
Pandas Index with duplicates... Index(['Car', 'Bike', 'Airplane', 'Ship', 'Airplane'], dtype='object')
Identifying Duplicates
Use duplicated() to identify duplicate values. By default, it marks the first occurrence as False ?
import pandas as pd
index = pd.Index(['Car', 'Bike', 'Airplane', 'Ship', 'Airplane'])
# Indicate duplicate index values as True, rest False
print("Indicating duplicate values...")
print(index.duplicated())
Indicating duplicate values... [False False False False True]
Using the keep Parameter
The keep parameter controls which occurrence to mark as non-duplicate ?
import pandas as pd
index = pd.Index(['Car', 'Bike', 'Airplane', 'Ship', 'Airplane'])
print("Keep first (default):")
print(index.duplicated(keep='first'))
print("\nKeep last:")
print(index.duplicated(keep='last'))
print("\nKeep none (mark all duplicates):")
print(index.duplicated(keep=False))
Keep first (default): [False False False False True] Keep last: [ True False False False False] Keep none (mark all duplicates): [ True False True False True]
Practical Example
Here's a complete example showing how to filter out duplicate index values ?
import pandas as pd
# Create a DataFrame with duplicate index values
data = {'Value': [100, 200, 300, 400, 500]}
df = pd.DataFrame(data, index=['A', 'B', 'A', 'C', 'A'])
print("Original DataFrame:")
print(df)
print("\nDuplicate index mask:")
print(df.index.duplicated())
print("\nDataFrame without duplicate indices:")
print(df[~df.index.duplicated()])
Original DataFrame: Value A 100 B 200 A 300 C 400 A 500 Duplicate index mask: [False False True False True] DataFrame without duplicate indices: Value A 100 B 200 C 400
Conclusion
The duplicated() method is essential for identifying duplicate index values in Pandas. Use the keep parameter to control which duplicates to mark, and combine with boolean indexing to filter data effectively.
