Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python - Pairwise distances of n-dimensional Space Array
Pairwise distance calculation is used in various domains including data analysis, machine learning, and image processing. We can calculate the pairwise distance between every pair of elements in each dataset. In this article, we will explore various methods to calculate pairwise distances in Python for arrays representing data in multiple dimensions.
What is Pairwise Distance?
Pairwise distance refers to calculating the distance between each pair of points in an n-dimensional space. You can choose different distance metrics according to the type of data and problem requirements.
Common distance metrics include ?
Euclidean distance Measures the straight-line distance between points
Manhattan distance Sum of absolute differences along each dimension
Minkowski distance Generalizes both Euclidean and Manhattan distances
These metrics help identify similarity or dissimilarity between data points based on your specific problem.
Using SciPy's cdist Function
SciPy's cdist function provides an efficient way to calculate pairwise distances using various metrics ?
import numpy as np
from scipy.spatial.distance import cdist
pts = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
dist = cdist(pts, pts, metric='euclidean')
print("Euclidean distances:")
print(dist)
Euclidean distances: [[ 0. 5.19615242 10.39230485] [ 5.19615242 0. 5.19615242] [10.39230485 5.19615242 0. ]]
The result shows a symmetric matrix where diagonal elements are zero (distance from a point to itself) and off-diagonal elements show distances between different points.
Using Scikit-learn's pairwise_distances
Scikit-learn provides the pairwise_distances function with support for multiple distance metrics ?
from sklearn.metrics import pairwise_distances
import numpy as np
pts = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
manhattan_dist = pairwise_distances(pts, metric='manhattan')
print("Manhattan distances:")
print(manhattan_dist)
Manhattan distances: [[ 0. 9. 18.] [ 9. 0. 9.] [18. 9. 0.]]
Using pdist Function
The pdist function from SciPy returns a condensed distance matrix (upper triangular) ?
from scipy.spatial.distance import pdist, squareform
import numpy as np
pts = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Get condensed distance matrix
condensed_dist = pdist(pts, metric='euclidean')
print("Condensed distances:")
print(condensed_dist)
# Convert to square matrix
square_dist = squareform(condensed_dist)
print("\nSquare distance matrix:")
print(square_dist)
Condensed distances: [ 5.19615242 10.39230485 5.19615242] Square distance matrix: [[ 0. 5.19615242 10.39230485] [ 5.19615242 0. 5.19615242] [10.39230485 5.19615242 0. ]]
Using NearestNeighbors Class
The NearestNeighbors class can calculate distances to nearest neighbors ?
from sklearn.neighbors import NearestNeighbors
import numpy as np
pts = np.array([[1, 2], [4, 5], [7, 8]])
nbrs = NearestNeighbors(n_neighbors=len(pts)).fit(pts)
distances, indices = nbrs.kneighbors(pts)
print("Distances to nearest neighbors:")
print(distances)
print("\nIndices of nearest neighbors:")
print(indices)
Distances to nearest neighbors: [[0. 4.24264069 8.48528137] [0. 4.24264069 4.24264069] [0. 4.24264069 8.48528137]] Indices of nearest neighbors: [[0 1 2] [1 0 2] [2 1 0]]
Comparison of Methods
| Method | Function | Output Format | Best For |
|---|---|---|---|
| SciPy cdist | cdist() |
Full matrix | Different point sets |
| SciPy pdist | pdist() |
Condensed array | Memory efficiency |
| Scikit-learn | pairwise_distances() |
Full matrix | ML workflows |
| NearestNeighbors | kneighbors() |
K nearest only | Finding neighbors |
Conclusion
Use cdist() for comparing two different datasets, pdist() for memory-efficient single dataset analysis, and pairwise_distances() for machine learning workflows. Choose the method that best fits your computational needs and output format requirements.
