Find unique rows in a NumPy array


Duplicate rows in a dataset must frequently be found and removed in data science and machine learning and to solve this issue, a well-liked Python toolkit for numerical computation called NumPy offers a number of methods for manipulating arrays. In this tutorial, we'll go through how to use Python to locate unique rows in a NumPy array.

Installation and Setup

NumPy must first be installed using pip before it can be used in Python.

pip install numpy

Once installed, we can import the NumPy library in Python using the following statement −

import numpy as np

Syntax

The NumPy function that we will use to find unique rows in a NumPy array is np.unique(). The syntax of this function is as follows −

np.unique(arr, axis=0)

Here, arr is the NumPy array in which we want to find the unique rows, and axis is the axis along which to perform the uniqueness test. By default, axis=0 which means that we will perform the uniqueness test along the rows of the array.

Code Algorithm

  • Import the required library - Numpy

  • Create a NumPy array using np.array() with some duplicate rows.

  • Use np.unique() function to find unique rows and assign the result to a variable called unique_rows.

  • Finally, print the unique_rows array using print() function.

Example

import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6], [1, 2, 3]])
unique_rows = np.unique(arr, axis=0)
print(unique_rows)

Output

[[1 2 3]
 [4 5 6]]

Create a NumPy array arr with some duplicate rows. We use the np.unique() function with axis=0 to find the unique rows and assign the result to a variable called unique_rows. Finally, we print the unique_rows array.

Example 2

import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

unique_rows = np.unique(arr, axis=0)

print(unique_rows)

Output

([[1, 2, 3],
   [4, 5, 6],
   [7, 8, 9]])

Create a NumPy array arr with no duplicate rows. We use the np.unique() function with axis=0 to find the unique rows and assign the result to a variable called unique_rows. Finally, we print the unique_rows array.

Suppose we have a NumPy array representing a dataset with some duplicate rows. We want to find and remove these duplicate rows from the dataset. The dataset is given below −

import numpy as np
dataset = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [1, 2, 3, 4], [9, 10, 11, 12], [5, 6, 7, 8]])
unique_rows = np.unique(dataset, axis=0)
print(unique_rows)

Output

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Create a NumPy array dataset representing a dataset with some duplicate rows. We use the np.unique() function with axis=0 to find the unique rows and assign the result to a variable called unique_rows. Finally, we print the unique_rows array. The output shows that the function successfully removed the duplicate rows from the dataset.

Applications

  • Due to the inherent complexity of data science and machine learning, it is frequently necessary to remove duplicate rows from a dataset to ensure model correctness and prevent overfitting. Finding unique rows in a NumPy array can be extremely challenging.

  • This is frequently achieved by setting up the information utilizing the np.unique() strategy which makes it simpler to find and concentrate the exceptional lines from a NumPy cluster so you might utilize them to make a shiny new dataset liberated from copies.

  • It is urgent to recollect that this strategy may not work for datasets with additional mind boggling geographies and is just suitable for 1D and 2D clusters and you ought to consider different methodologies in these circumstances to address difficulties presented by higher request complex datasets

Conclusion

The topic of finding unique rows in a NumPy array using Python was covered in this article. The ability of the np.unique() method to locate and eliminate duplicate rows from a dataset has been demonstrated. To illustrate how the function is used, we have given a few examples. There are several additional practical functions for manipulating arrays provided by the robust Python package NumPy.

Updated on: 21-Aug-2023

368 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements