How to drop duplicate rows in pandas series?

PandasServer Side ProgrammingProgramming

The main advantage of using the pandas package is analysing the data for Data Science and Machine Learning applications. In the process of analysing the data, deleting duplicate values is a commonly used data cleaning task.

To remove duplicate values from a pandas series object, we can use the drop_duplicate() method. This method returns a series with deleted duplicate rows, and it won’t alter the original series object. Instead, it will return a new one.

By using the inplace parameter, we can update the changes into the original series object by setting “inplace=True”.

The other important parameter in the drop_duplicates() method is “Keep”. The default behavior of this parameter is “first” which means it drops the duplicate values except for the first occurrence. Also, we can change it to last and False occurrences.

Example 1

In this following example, we have created a pandas series with a list of strings and we assigned the index labels also by defining index parameters.

# import pandas package
import pandas as pd

# create pandas series with duplicate values
series = pd.Series(
   ['John','Garyooo','John','Richard','Peter','Richard','Gary'],
   index=['East','West','North','South','East','West','North'])

print(series)

# delete duplicate values
result = series.drop_duplicates()

print('Output:',result)

Explanation

After creating the series object we applied the drop_duplicate() method without changing the default parameters.

The Pandas series is given below −

East       John
West    Garyooo
North      John
South   Richard
East      Peter
West    Richard
North      Gary
dtype: object

Output

The output is as follows −

East       John
West    Garyooo
South   Richard
East      Peter
North      Gary
dtype:   object

The drop_duplicate method returns a new series object with deleted rows. Here the original series object does not affect by this method instead it returns a new series object.

Example 2

For the same example, we have changed the inplace parameter value from default False to True.

# import pandas package
import pandas as pd

# create pandas series with duplicate values
series = pd.Series(
   ['John','Garyooo','John','Richard','Peter','Richard','Gary'],
   index=['East','West','North','South','East','West','North'])

print(series)

# delete duplicate values with inplace=True
result = series.drop_duplicates(inplace=True)

print('Output:\n',result)

print(series)

Explanation

By setting the True value to the inplace parameter, we can modify our original series object with deleted rows and the method returns None as its output.

The Pandas series is as follows −

East       John
West    Garyooo
North      John
South   Richard
East      Peter
West    Richard
North      Gary
dtype: object

Output

The output is given below −

Output: None

East       John
West    Garyooo
South   Richard
East      Peter
North      Gary
dtype: object

By setting inplace=True, we have successfully updated the original series object with deleted rows. We can see the outputs in the above output block, and the value “None” is the output from the drop_duplicates() method.

raja
Updated on 04-Mar-2022 07:52:04

Advertisements