How does the keep parameter work in the pandas series.drop_duplicates() method?

PandasServer Side ProgrammingProgramming

The drop_duplicate() method in the pandas series constructor is used to remove the duplicate values from a series object. This method cleans the duplicate values and returns a series with modified rows, and it won’t alter the original series object. Instead, it will return a new one.

One of the important parameters in the drop_duplicates() method is “Keep”, the default value of this parameter is “first” which keeps the first occurrence value and deletes the remaining. We can also specify Last and False values to the keep parameter.

If keep=False, it will delete all duplicate values. Or if keep= “Last”, it deletes the duplicate values except for the last occurrence.

Example 1

In the following example, initially, we created a pandas Series by using the pandas series method with a list of strings. Later on, we applied the drop_duplicates() method by setting keep= “last”.

# import pandas package
import pandas as pd

# create pandas series with duplicate values
series = pd.Series(['Robin', 'John', 'Nori', 'Yi', 'Robin', 'Amal', 'Nori'])
print(series)

# delete duplicate values with keep='last'
result = series.drop_duplicates(keep='last')

print('Output:\n',result)

Output

The output is given below −

0    Robin
1     John
2     Nori
3       Yi
4    Robin
5     Amal
6     Nori
dtype: object

Output:
1    John
3      Yi
4   Robin
5    Amal
6    Nori
dtype: object

The value “Robin” is repeated in two index positions “0” and “4”, and the value “Nori” is also repeated in two positions “2”, “6”.

By setting keep=Last, we have successfully deleted the values at index positions 0 and 2.

Example 2

For the same example, we have changed the value of the keep parameter from “last” to “first”.

# import pandas package
import pandas as pd

# create pandas series with duplicate values
series = pd.Series(['Robin', 'John', 'Nori', 'Yi', 'Robin', 'Amal', 'Nori'])
print(series)

# delete duplicate values with keep='first'
result = series.drop_duplicates(keep='first')

print('Output:\n',result)

Output

You will get the following output −

0    Robin
1     John
2     Nori
3       Yi
4    Robin
5     Amal
6     Nori
dtype: object

Output:
0    Robin
1     John
2     Nori
3       Yi
5     Amal
dtype: object

For the above mentioned output, the duplicate values at “4” and “6” are deleted, because the values “Robin” and “Nori” occurred Fist at “0” and “2” positions.

Example 3

In this example we will see, how does drop_duplicates() method work for the keep=False value. We have initially created a series object with a list of integers then applied the method.

# import pandas package
import pandas as pd

# create pandas series with duplicate values
series = pd.Series([1,2,1,3,4,2,6,4,5])
print(series)

# delete duplicate values with keep=False
result = series.drop_duplicates(keep=False)

print('Output:\n',result)

Output

The output is given below −

0    1
1    2
2    1
3    3
4    4
5    2
6    6
7    4
8    5
dtype: int64

Output:
3    3
6    6
8    5
dtype: int64

The resultant series object from the drop_duplicates() method only has 3 rows whereas the original series has 9 rows. It happened because keep=False will remove all duplicate values, it does keep any single occurrences.

raja
Updated on 04-Mar-2022 08:03:29

Advertisements