- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to remove rows in a Pandas series with duplicate indices?
By using the duplicated() method in the pandas series constructor we can easily identify the duplicate values in the index of a series object. The method duplicated() is used to identify the duplicate values in a series object.
The duplicated() method will return a series with boolean values. Boolean value False indicates single occurrence values mean unique values. The duplicated values are indicated with boolean value True.
Example 1
Here we will see how we can delete the rows of a series object with duplicate indices.
# importing pandas package import pandas as pd #create series series = pd.Series(["a", "b", "c", "d", "e"],index=[1, 2, 1, 3, 2]) print(series) # getting the index data index = series.index # removing duplicate indices separately result = series[~index.duplicated(keep="first")] print(result)
Explanation
Initially, we have created a pandas series object using the pandas.Series() function with index labels [1, 2,1, 3, 2]. Then, we applied the duplicated() method on index data to identify the duplicate labels.
After that we applied the “~” to reverse the resultant boolean values and sent this data to the original series as a subset to get a new series object without any duplicate indices.
Output
The output is mentioned below −
1 a 2 b 1 c 3 d 2 e dtype: object 1 a 2 b 3 d dtype: object
In the above output block, we can see the original series object as well as the resultant series object without duplicate labels.
Example 2
Let’s take another example to remove rows of a series object with duplicate indices.
# importing package import pandas as pd import numpy as np # creating pandas series series = pd.Series(np.random.randint(1,100,10), index=["a", "b", "a", "d", "c", "e", "f", "c", "d", "e"]) print(series) # getting the index data index = series.index # removing duplicate indices separately result = series[~index.duplicated(keep="first")] print(result)
Explanation
Initially, we created the series object with labeled index data and then applied the duplicated() method to identify the duplicate labels.
Output
The output is given below −
a 66 b 73 a 83 d 63 c 23 e 56 f 55 c 22 d 26 e 20 dtype: int32 a 66 b 73 d 63 c 23 e 56 f 55 dtype: int32
The labels a, d, c, e occurred more than one time in the initial series object and those rows are removed in the resultant series object.