Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
Write a program in Python to check if a series contains duplicate elements or not
A Pandas Series is a one-dimensional data structure that can contain duplicate values. To check if a series contains duplicates, we can compare the series length with the number of unique elements.
Sample Series Data
Let's start with a series that has no duplicate elements ?
import pandas as pd
import numpy as np
# Series with no duplicates
data = pd.Series([1, 2, 3, 4, 5])
print("Original Series:")
print(data)
Original Series: 0 1 1 2 2 3 3 4 4 5 dtype: int64
Method 1: Using len() and np.unique()
Compare the length of the original series with the length of unique elements ?
import pandas as pd
import numpy as np
data = pd.Series([1, 2, 3, 4, 5])
if len(data) == len(np.unique(data)):
print("No duplicates found")
else:
print("Duplicates found!")
No duplicates found
Method 2: Using Pandas duplicated()
The duplicated() method returns a boolean Series indicating duplicate rows ?
import pandas as pd
data = pd.Series([1, 2, 3, 4, 5])
if data.duplicated().any():
print("Duplicates found!")
else:
print("No duplicates found")
No duplicates found
Testing with Duplicate Elements
Now let's test both methods with a series containing duplicate values ?
import pandas as pd
import numpy as np
# Series with duplicates
data_with_duplicates = pd.Series([1, 2, 3, 4, 5, 3])
print("Series with duplicates:")
print(data_with_duplicates)
# Method 1: Using np.unique()
if len(data_with_duplicates) == len(np.unique(data_with_duplicates)):
print("Method 1: No duplicates found")
else:
print("Method 1: Duplicates found!")
# Method 2: Using duplicated()
if data_with_duplicates.duplicated().any():
print("Method 2: Duplicates found!")
else:
print("Method 2: No duplicates found")
Series with duplicates: 0 1 1 2 2 3 3 4 4 5 5 3 dtype: int64 Method 1: Duplicates found! Method 2: Duplicates found!
Comparison
| Method | Function | Best For |
|---|---|---|
| np.unique() | Compares lengths | Simple boolean check |
| duplicated() | Returns boolean Series | Finding which elements are duplicated |
Conclusion
Use duplicated().any() for a pandas-native approach, or compare len(data) with len(np.unique(data)) for a simple length-based check. Both methods effectively identify duplicate elements in a Series.
Advertisements
