Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python – Display only non-duplicate values from a DataFrame
In this tutorial, we will learn how to display only non-duplicate values from a Pandas DataFrame. We'll use the duplicated() method combined with the logical NOT operator (~) to filter out duplicate entries.
Creating a DataFrame with Duplicates
First, let's create a DataFrame containing duplicate values ?
import pandas as pd
# Create DataFrame with duplicate student names
dataFrame = pd.DataFrame({
"Student": ['Jack', 'Robin', 'Ted', 'Robin', 'Scarlett', 'Kat', 'Ted'],
"Result": ['Pass', 'Fail', 'Pass', 'Fail', 'Pass', 'Pass', 'Pass']
})
print("Original DataFrame:")
print(dataFrame)
Original DataFrame:
Result Student
0 Pass Jack
1 Fail Robin
2 Pass Ted
3 Fail Robin
4 Pass Scarlett
5 Pass Kat
6 Pass Ted
Using duplicated() Method
The duplicated() method returns a boolean Series indicating duplicate rows. By using the logical NOT operator (~), we can filter out duplicates ?
import pandas as pd
dataFrame = pd.DataFrame({
"Student": ['Jack', 'Robin', 'Ted', 'Robin', 'Scarlett', 'Kat', 'Ted'],
"Result": ['Pass', 'Fail', 'Pass', 'Fail', 'Pass', 'Pass', 'Pass']
})
# Display only non-duplicate values based on 'Student' column
non_duplicates = dataFrame[~dataFrame.duplicated('Student')]
print("DataFrame with non-duplicate students:")
print(non_duplicates)
DataFrame with non-duplicate students:
Result Student
0 Pass Jack
1 Fail Robin
2 Pass Ted
4 Pass Scarlett
5 Pass Kat
How It Works
The duplicated('Student') method identifies rows where the 'Student' column has duplicate values. The tilde (~) operator negates this boolean mask, selecting only the first occurrence of each unique student name.
Parameters of duplicated()
The duplicated() method accepts several parameters ?
import pandas as pd
dataFrame = pd.DataFrame({
"Student": ['Jack', 'Robin', 'Ted', 'Robin', 'Scarlett', 'Kat', 'Ted'],
"Result": ['Pass', 'Fail', 'Pass', 'Fail', 'Pass', 'Pass', 'Pass']
})
# Keep last occurrence instead of first
last_occurrence = dataFrame[~dataFrame.duplicated('Student', keep='last')]
print("Keeping last occurrence of duplicates:")
print(last_occurrence)
Keeping last occurrence of duplicates:
Result Student
0 Pass Jack
3 Fail Robin
4 Pass Scarlett
5 Pass Kat
6 Pass Ted
Conclusion
Use dataFrame[~dataFrame.duplicated(column)] to display only non-duplicate values. The duplicated() method identifies duplicates, while the ~ operator inverts the selection to show unique entries only.
