Article Categories

Selected Reading

Python Pandas – Create a subset and display only the last entry from duplicate values

Python Server Side Programming Programming

To create a subset and display only the last entry from duplicate values, use the drop_duplicates() method with the keep parameter set to 'last'. This method removes duplicate rows based on specified columns and keeps only the last occurrence of each duplicate.

Creating the DataFrame

Let us first create a DataFrame with duplicate entries ?

import pandas as pd

# Create DataFrame with duplicate Car-Place combinations
dataFrame = pd.DataFrame({
    'Car': ['BMW', 'Mercedes', 'Lamborghini', 'BMW', 'Mercedes', 'Porsche'],
    'Place': ['Delhi', 'Hyderabad', 'Chandigarh', 'Delhi', 'Hyderabad', 'Mumbai'],
    'UnitsSold': [85, 70, 80, 95, 55, 90]
})

print("Original DataFrame:")
print(dataFrame)

Original DataFrame:
          Car       Place  UnitsSold
0         BMW       Delhi         85
1    Mercedes   Hyderabad         70
2 Lamborghini  Chandigarh         80
3         BMW       Delhi         95
4    Mercedes   Hyderabad         55
5     Porsche      Mumbai         90

Using drop_duplicates() with keep='last'

Now we'll remove duplicates based on the Car and Place columns, keeping only the last occurrence ?

import pandas as pd

# Create DataFrame
dataFrame = pd.DataFrame({
    'Car': ['BMW', 'Mercedes', 'Lamborghini', 'BMW', 'Mercedes', 'Porsche'],
    'Place': ['Delhi', 'Hyderabad', 'Chandigarh', 'Delhi', 'Hyderabad', 'Mumbai'],
    'UnitsSold': [85, 70, 80, 95, 55, 90]
})

# Remove duplicates and keep last entry
# Using subset parameter to specify columns for duplicate detection
dataFrame2 = dataFrame.drop_duplicates(subset=['Car', 'Place'], keep='last').reset_index(drop=True)

print("DataFrame after removing duplicates (keeping last):")
print(dataFrame2)

DataFrame after removing duplicates (keeping last):
          Car       Place  UnitsSold
0 Lamborghini  Chandigarh         80
1         BMW       Delhi         95
2    Mercedes   Hyderabad         55
3     Porsche      Mumbai         90

How It Works

The drop_duplicates() method with these parameters:

subset: Specifies which columns to consider for identifying duplicates
keep='last': Keeps the last occurrence of each duplicate group
reset_index(drop=True): Resets the index after removing rows

Comparison of keep Parameter Values

Parameter	Description	BMW-Delhi Result
`keep='first'`	Keep first occurrence	Index 0 (UnitsSold: 85)
`keep='last'`	Keep last occurrence	Index 3 (UnitsSold: 95)
`keep=False`	Remove all duplicates	Neither (both removed)

Conclusion

Use drop_duplicates(subset=['columns'], keep='last') to keep only the last occurrence of duplicate values. The subset parameter defines which columns determine duplicates, while keep='last' preserves the final entry from each duplicate group.

AmitDiwan

Updated on: 2026-03-26T13:22:35+05:30

1K+ Views

Previous Next