Python - Pairwise distances of n-dimensional Space Array


Pairwise distance calculation is used in various domains including data analysis, machine learning and image processing. We can calculate the pairwise distance between every pair of elements in each dataset. In this article we will get to know about various methods to calculate pairwise distances in python for arrays representing data in multiple dimensions. We will also get to know about the pdist function available in the SciPy library.

Pairwise Distance

One thing to remember: While calculating the pairwise distance in n-dimensional space we have to find the distance between each pair of points. You can choose any distance metrics according to the type of data and the specification of the problem you want to solve.

Some of the commonly used distance metrics include −

  • Euclidean distance − It is used to measure the straight-line distance.

  • Manhattan distance − It is used to measure the sum of absolute differences along each dimension.

  • Minkowski distance − It is used to generalize both Euclidean and Manhattan distances.

These metrics help us identify the dissimilarity or similarity between data points in different ways based on the problem at hand.

Let’s see some of the methods to calculate pairwise distances.

Method 1: Manual Calculation

We can manually calculate pairwise distances by implementing the distance calculation formula. see Here by using an example of two points (x1, y1) and (x2, y2). we can calculate Euclidean distance between these points using the following formula −

distance = sqrt((x2 - x1)^2 + (y2 - y1)^2)

We can apply this formula for every pair of point to calculate the pairwise distances. Using this approach can become computationally expensive and time-consuming for large datasets and higher-dimensional arrays.

Method 2: NumPy and SciPy Libraries

This method utilizes the features NumPy and SciPy libraries. These libraries are popular and efficient tools for scientific calculation in Python language. These libraries offer optimized functions that can calculate pairwise distances effectively, saving time and simplifying the process.

To calculate NumPy and SciPy for pairwise distance, we start by converting our array representing the data in multiple dimensions into a matrix format. This can be achieved by using the NumPy array function, which creates a matrix from our dataset. Subsequently, we can leverage the cdist function provided by SciPy to compute the pairwise distances.

Example

import numpy as np
from scipy.spatial.distance import cdist
pts = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
dist = cdist(pts, pts, metric='euclidean')
print(dist)

Output

[[ 0.          5.19615242 10.39230485]
 [ 5.19615242  0.          5.19615242]
 [10.39230485  5.19615242  0.        ]]

In the above example, we create 3-dimensional array pts and use the cdist function which is used to calculate the pairwise distance using the Euclidean distance metrics. In the resultant array it will have the distances between each pair of points in the pts array.

Method 3: Scikit-learn Library

Scikit-learn is a Python library used for the purpose of machine learning work. It offers a verity of functionalities for data analysis and modeling. One of its useful features is the ability to calculate pairwise distances effortlessly.

Scikit-learn is a Python library used in machine learning work. It offers a broad range of functionalities for data analysis and modeling. One of its useful features is the ability to calculate pairwise distances effortlessly.

For instance, if we want to calculate pairwise distances using the Manhattan distance metric, we can utilize scikit-learn's pairwise_distances function. It will do the computation work to save time as well as effort.

Example

from sklearn.metrics import pairwise_distances
pts = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
dis = pairwise_distances(pts, metric='manhattan')
print(dis)

Output

[[ 0. 9. 18.] 
[ 9. 0. 9.] 
[18. 9. 0.]]

Here in the example, we took a 3-dimensional array named 'pts' and to calculate the pairwise distance we will use pairwise_distances function. The resultant array will contain the distance between every pair of points

Method 4: Scipy.spatial.distance Module

In this method we will use scipy.spatial.distance which provides various types of distance matrix for pairwise distance calculation.

Example

from scipy.spatial.distance import cdist
pts = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
dis = cdist(pts, pts)
print(dis)

Output

[[ 0.          5.19615242 10.39230485]
 [ 5.19615242  0.          5.19615242]
 [10.39230485  5.19615242  0.        ]]

Method 5: Using NearestNeighbors Class

In this method we can use NearestNeighbors class from the skit learn library. We can use this class to find the nearest neighbors as well as find the distances between the points.

Example

from sklearn.neighbors import NearestNeighbors

pts = [[1, 2], [4, 5], [7, 8]]
nbrs = NearestNeighbors(n_neighbors=len(pts)).fit(pts)
dis, _ = nbrs.kneighbors(pts)

print(dis)

Output

[[0. 4.24264069 8.48528137] 
[0. 4.24264069 4.24264069] 
[0. 4.24264069 8.48528137]]

Explanation

In the above program we have created the instances of class NearestNeighbors and using the kneighbors method we will find the distances between every point and its nearest neighbors. Here in the output the first point 0 represents the distance between point 1,2 to itself. Whereas the second element represents the distance between (1,2) to (4,5).

So, these were some methods to calculate the pairwise distance of n-dimension array using python language. You can implement any of the method in your program which you find optimal or comfortable working with to calculate the pairwise distance.

Updated on: 13-Oct-2023

188 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements