How to remove rows in a Pandas series with duplicate indices?

Pandas Server Side Programming Programming

By using the duplicated() method in the pandas series constructor we can easily identify the duplicate values in the index of a series object. The method duplicated() is used to identify the duplicate values in a series object.

The duplicated() method will return a series with boolean values. Boolean value False indicates single occurrence values mean unique values. The duplicated values are indicated with boolean value True.

Example 1

Here we will see how we can delete the rows of a series object with duplicate indices.

# importing pandas package
import pandas as pd

#create series
series = pd.Series(["a", "b", "c", "d", "e"],index=[1, 2, 1, 3, 2])

print(series)

# getting the index data
index = series.index

# removing duplicate indices separately
result = series[~index.duplicated(keep="first")]

print(result)

Explanation

Initially, we have created a pandas series object using the pandas.Series() function with index labels [1, 2,1, 3, 2]. Then, we applied the duplicated() method on index data to identify the duplicate labels.

After that we applied the “~” to reverse the resultant boolean values and sent this data to the original series as a subset to get a new series object without any duplicate indices.

Output

The output is mentioned below −

1    a
2    b
1    c
3    d
2    e
dtype: object

1    a
2    b
3    d
dtype: object

In the above output block, we can see the original series object as well as the resultant series object without duplicate labels.

Example 2

Let’s take another example to remove rows of a series object with duplicate indices.

# importing package
import pandas as pd
import numpy as np

# creating pandas series
series = pd.Series(np.random.randint(1,100,10),
   index=["a", "b", "a", "d", "c", "e", "f", "c", "d", "e"])

print(series)

# getting the index data
index = series.index

# removing duplicate indices separately
result = series[~index.duplicated(keep="first")]

print(result)

Explanation

Initially, we created the series object with labeled index data and then applied the duplicated() method to identify the duplicate labels.

Output

The output is given below −

a    66
b    73
a    83
d    63
c    23
e    56
f    55
c    22
d    26
e    20
dtype: int32

a    66
b    73
d    63
c    23
e    56
f    55
dtype: int32

The labels a, d, c, e occurred more than one time in the initial series object and those rows are removed in the resultant series object.

Gireesha Devara

Updated on: 2022-03-07T06:20:05+05:30

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started