Python Pandas – Create a subset and display only the last entry from duplicate values

To create a subset and display only the last entry from duplicate values, use the drop_duplicates() method with the keep parameter set to 'last'. This method removes duplicate rows based on specified columns and keeps only the last occurrence of each duplicate.

Creating the DataFrame

Let us first create a DataFrame with duplicate entries ?

import pandas as pd

# Create DataFrame with duplicate Car-Place combinations
dataFrame = pd.DataFrame({
    'Car': ['BMW', 'Mercedes', 'Lamborghini', 'BMW', 'Mercedes', 'Porsche'],
    'Place': ['Delhi', 'Hyderabad', 'Chandigarh', 'Delhi', 'Hyderabad', 'Mumbai'],
    'UnitsSold': [85, 70, 80, 95, 55, 90]
})

print("Original DataFrame:")
print(dataFrame)
Original DataFrame:
          Car       Place  UnitsSold
0         BMW       Delhi         85
1    Mercedes   Hyderabad         70
2 Lamborghini  Chandigarh         80
3         BMW       Delhi         95
4    Mercedes   Hyderabad         55
5     Porsche      Mumbai         90

Using drop_duplicates() with keep='last'

Now we'll remove duplicates based on the Car and Place columns, keeping only the last occurrence ?

import pandas as pd

# Create DataFrame
dataFrame = pd.DataFrame({
    'Car': ['BMW', 'Mercedes', 'Lamborghini', 'BMW', 'Mercedes', 'Porsche'],
    'Place': ['Delhi', 'Hyderabad', 'Chandigarh', 'Delhi', 'Hyderabad', 'Mumbai'],
    'UnitsSold': [85, 70, 80, 95, 55, 90]
})

# Remove duplicates and keep last entry
# Using subset parameter to specify columns for duplicate detection
dataFrame2 = dataFrame.drop_duplicates(subset=['Car', 'Place'], keep='last').reset_index(drop=True)

print("DataFrame after removing duplicates (keeping last):")
print(dataFrame2)
DataFrame after removing duplicates (keeping last):
          Car       Place  UnitsSold
0 Lamborghini  Chandigarh         80
1         BMW       Delhi         95
2    Mercedes   Hyderabad         55
3     Porsche      Mumbai         90

How It Works

The drop_duplicates() method with these parameters:

  • subset: Specifies which columns to consider for identifying duplicates
  • keep='last': Keeps the last occurrence of each duplicate group
  • reset_index(drop=True): Resets the index after removing rows

Comparison of keep Parameter Values

Parameter Description BMW-Delhi Result
keep='first' Keep first occurrence Index 0 (UnitsSold: 85)
keep='last' Keep last occurrence Index 3 (UnitsSold: 95)
keep=False Remove all duplicates Neither (both removed)

Conclusion

Use drop_duplicates(subset=['columns'], keep='last') to keep only the last occurrence of duplicate values. The subset parameter defines which columns determine duplicates, while keep='last' preserves the final entry from each duplicate group.

Updated on: 2026-03-26T13:22:35+05:30

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements