Article Categories

Selected Reading

Python Pandas - Indicate duplicate index values

Python Pandas Server Side Programming Programming

The index.duplicated() method in Pandas identifies duplicate values in an index by returning a boolean array. It marks duplicate occurrences as True while keeping the first occurrence unmarked by default.

Basic Usage

Let's start by creating an index with some duplicate values ?

import pandas as pd

# Creating the index with some duplicates
index = pd.Index(['Car', 'Bike', 'Airplane', 'Ship', 'Airplane'])

# Display the index
print("Pandas Index with duplicates...")
print(index)

Pandas Index with duplicates...
Index(['Car', 'Bike', 'Airplane', 'Ship', 'Airplane'], dtype='object')

Identifying Duplicates

Use duplicated() to identify duplicate values. By default, it marks the first occurrence as False ?

import pandas as pd

index = pd.Index(['Car', 'Bike', 'Airplane', 'Ship', 'Airplane'])

# Indicate duplicate index values as True, rest False
print("Indicating duplicate values...")
print(index.duplicated())

Indicating duplicate values...
[False False False False  True]

Using the keep Parameter

The keep parameter controls which occurrence to mark as non-duplicate ?

import pandas as pd

index = pd.Index(['Car', 'Bike', 'Airplane', 'Ship', 'Airplane'])

print("Keep first (default):")
print(index.duplicated(keep='first'))

print("\nKeep last:")
print(index.duplicated(keep='last'))

print("\nKeep none (mark all duplicates):")
print(index.duplicated(keep=False))

Keep first (default):
[False False False False  True]

Keep last:
[ True False False False False]

Keep none (mark all duplicates):
[ True False  True False  True]

Practical Example

Here's a complete example showing how to filter out duplicate index values ?

import pandas as pd

# Create a DataFrame with duplicate index values
data = {'Value': [100, 200, 300, 400, 500]}
df = pd.DataFrame(data, index=['A', 'B', 'A', 'C', 'A'])

print("Original DataFrame:")
print(df)

print("\nDuplicate index mask:")
print(df.index.duplicated())

print("\nDataFrame without duplicate indices:")
print(df[~df.index.duplicated()])

Original DataFrame:
   Value
A    100
B    200
A    300
C    400
A    500

Duplicate index mask:
[False False  True False  True]

DataFrame without duplicate indices:
   Value
A    100
B    200
C    400

Conclusion

The duplicated() method is essential for identifying duplicate index values in Pandas. Use the keep parameter to control which duplicates to mark, and combine with boolean indexing to filter data effectively.

AmitDiwan

Updated on: 2026-03-26T16:17:46+05:30

926 Views

Previous Next