- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
What is the use of the series.duplicated() method in pandas?
Finding the duplicate values in an object is a very common task in the data analysis process. In pandas, we have a function called duplicated() which is used to identify the duplicate values.
For a pandas series object, the duplicated() method will return a series with boolean values. True indicates duplicate values only for the last occurrence values or the first occurrence values or it may indicate all the duplicate values.
The duplicated() method has a parameter called “keep” which is used to treat the duplicate values differently. The default behavior of this parameter is “first” which means it marks all the duplicate values as True except for the first occurrence. We can change it to last and False to mark all occurrences.
Example 1
In this following example, we have created a pandas series with a list of strings, after that, we applied the duplicated() method without changing the default parameters.
# importing required packages import pandas as pd # creating pandas Series object series = pd.Series(['A', 'B', 'E', 'C', 'A', 'E']) print(series) # apply duplicated() method print("Output:",series.duplicated())
Output
The output is as follows −
0 A 1 B 2 E 3 C 4 A 5 E dtype: object Output: 0 False 1 False 2 False 3 False 4 True 5 True dtype: bool
The duplicated() method returns a new series object with boolean values. And the values at index position 4 and 5 are marked as True because A and E appear previously and remaining all appear only once.
Example 2
For the following example, we mentioned the value last to the keep parameter for identifying the duplicate values in the first occurrence.
# importing required packages import pandas as pd # creating pandas Series object series = pd.Series([90,54,43,90,28,43,67]) print(series) # apply duplicated() method print("Output:",series.duplicated(keep='last'))
Output
The output is given below −
0 90 1 54 2 43 3 90 4 28 5 43 6 67 dtype: int64 Output: 0 True 1 False 2 True 3 False 4 False 5 False 6 False dtype: bool
We have successfully detected the duplicated values except the last occurred of the given series object. The values at index positions 0 and 2 are marked as True because 90 and 43 appear more than once in the series object and the remaining appear only once.