What is the use of the series.duplicated() method in pandas?



Finding the duplicate values in an object is a very common task in the data analysis process. In pandas, we have a function called duplicated() which is used to identify the duplicate values.

For a pandas series object, the duplicated() method will return a series with boolean values. True indicates duplicate values only for the last occurrence values or the first occurrence values or it may indicate all the duplicate values.

The duplicated() method has a parameter called “keep” which is used to treat the duplicate values differently. The default behavior of this parameter is “first” which means it marks all the duplicate values as True except for the first occurrence. We can change it to last and False to mark all occurrences.

Example 1

In this following example, we have created a pandas series with a list of strings, after that, we applied the duplicated() method without changing the default parameters.

# importing required packages
import pandas as pd

# creating pandas Series object
series = pd.Series(['A', 'B', 'E', 'C', 'A', 'E'])
print(series)

# apply duplicated() method
print("Output:",series.duplicated())

Output

The output is as follows −

0    A
1    B
2    E
3    C
4    A
5    E
dtype: object

Output:
0    False
1    False
2    False
3    False
4     True
5     True
dtype: bool

The duplicated() method returns a new series object with boolean values. And the values at index position 4 and 5 are marked as True because A and E appear previously and remaining all appear only once.

Example 2

For the following example, we mentioned the value last to the keep parameter for identifying the duplicate values in the first occurrence.

# importing required packages
import pandas as pd

# creating pandas Series object
series = pd.Series([90,54,43,90,28,43,67])
print(series)

# apply duplicated() method
print("Output:",series.duplicated(keep='last'))

Output

The output is given below −

0    90
1    54
2    43
3    90
4    28
5    43
6    67
dtype: int64

Output:
0     True
1    False
2     True
3    False
4    False
5    False
6    False
dtype: bool

We have successfully detected the duplicated values except the last occurred of the given series object. The values at index positions 0 and 2 are marked as True because 90 and 43 appear more than once in the series object and the remaining appear only once.

Updated on: 2022-03-07T06:15:48+05:30

201 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements