How to remove rows in a Pandas series with duplicate indices?


By using the duplicated() method in the pandas series constructor we can easily identify the duplicate values in the index of a series object. The method duplicated() is used to identify the duplicate values in a series object.

The duplicated() method will return a series with boolean values. Boolean value False indicates single occurrence values mean unique values. The duplicated values are indicated with boolean value True.

Example 1

Here we will see how we can delete the rows of a series object with duplicate indices.

# importing pandas package
import pandas as pd

#create series
series = pd.Series(["a", "b", "c", "d", "e"],index=[1, 2, 1, 3, 2])

print(series)

# getting the index data
index = series.index

# removing duplicate indices separately
result = series[~index.duplicated(keep="first")]

print(result)

Explanation

Initially, we have created a pandas series object using the pandas.Series() function with index labels [1, 2,1, 3, 2]. Then, we applied the duplicated() method on index data to identify the duplicate labels.

After that we applied the “~” to reverse the resultant boolean values and sent this data to the original series as a subset to get a new series object without any duplicate indices.

Output

The output is mentioned below −

1    a
2    b
1    c
3    d
2    e
dtype: object

1    a
2    b
3    d
dtype: object

In the above output block, we can see the original series object as well as the resultant series object without duplicate labels.

Example 2

Let’s take another example to remove rows of a series object with duplicate indices.

# importing package
import pandas as pd
import numpy as np

# creating pandas series
series = pd.Series(np.random.randint(1,100,10),
   index=["a", "b", "a", "d", "c", "e", "f", "c", "d", "e"])

print(series)

# getting the index data
index = series.index

# removing duplicate indices separately
result = series[~index.duplicated(keep="first")]

print(result)

Explanation

Initially, we created the series object with labeled index data and then applied the duplicated() method to identify the duplicate labels.

Output

The output is given below −

a    66
b    73
a    83
d    63
c    23
e    56
f    55
c    22
d    26
e    20
dtype: int32

a    66
b    73
d    63
c    23
e    56
f    55
dtype: int32

The labels a, d, c, e occurred more than one time in the initial series object and those rows are removed in the resultant series object.

Updated on: 07-Mar-2022

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements