 
- SciPy - Home
- SciPy - Introduction
- SciPy - Environment Setup
- SciPy - Basic Functionality
- SciPy - Relationship with NumPy
- SciPy Clusters
- SciPy - Clusters
- SciPy - Hierarchical Clustering
- SciPy - K-means Clustering
- SciPy - Distance Metrics
- SciPy Constants
- SciPy - Constants
- SciPy - Mathematical Constants
- SciPy - Physical Constants
- SciPy - Unit Conversion
- SciPy - Astronomical Constants
- SciPy - Fourier Transforms
- SciPy - FFTpack
- SciPy - Discrete Fourier Transform (DFT)
- SciPy - Fast Fourier Transform (FFT)
- SciPy Integration Equations
- SciPy - Integrate Module
- SciPy - Single Integration
- SciPy - Double Integration
- SciPy - Triple Integration
- SciPy - Multiple Integration
- SciPy Differential Equations
- SciPy - Differential Equations
- SciPy - Integration of Stochastic Differential Equations
- SciPy - Integration of Ordinary Differential Equations
- SciPy - Discontinuous Functions
- SciPy - Oscillatory Functions
- SciPy - Partial Differential Equations
- SciPy Interpolation
- SciPy - Interpolate
- SciPy - Linear 1-D Interpolation
- SciPy - Polynomial 1-D Interpolation
- SciPy - Spline 1-D Interpolation
- SciPy - Grid Data Multi-Dimensional Interpolation
- SciPy - RBF Multi-Dimensional Interpolation
- SciPy - Polynomial & Spline Interpolation
- SciPy Curve Fitting
- SciPy - Curve Fitting
- SciPy - Linear Curve Fitting
- SciPy - Non-Linear Curve Fitting
- SciPy - Input & Output
- SciPy - Input & Output
- SciPy - Reading & Writing Files
- SciPy - Working with Different File Formats
- SciPy - Efficient Data Storage with HDF5
- SciPy - Data Serialization
- SciPy Linear Algebra
- SciPy - Linalg
- SciPy - Matrix Creation & Basic Operations
- SciPy - Matrix LU Decomposition
- SciPy - Matrix QU Decomposition
- SciPy - Singular Value Decomposition
- SciPy - Cholesky Decomposition
- SciPy - Solving Linear Systems
- SciPy - Eigenvalues & Eigenvectors
- SciPy Image Processing
- SciPy - Ndimage
- SciPy - Reading & Writing Images
- SciPy - Image Transformation
- SciPy - Filtering & Edge Detection
- SciPy - Top Hat Filters
- SciPy - Morphological Filters
- SciPy - Low Pass Filters
- SciPy - High Pass Filters
- SciPy - Bilateral Filter
- SciPy - Median Filter
- SciPy - Non - Linear Filters in Image Processing
- SciPy - High Boost Filter
- SciPy - Laplacian Filter
- SciPy - Morphological Operations
- SciPy - Image Segmentation
- SciPy - Thresholding in Image Segmentation
- SciPy - Region-Based Segmentation
- SciPy - Connected Component Labeling
- SciPy Optimize
- SciPy - Optimize
- SciPy - Special Matrices & Functions
- SciPy - Unconstrained Optimization
- SciPy - Constrained Optimization
- SciPy - Matrix Norms
- SciPy - Sparse Matrix
- SciPy - Frobenius Norm
- SciPy - Spectral Norm
- SciPy Condition Numbers
- SciPy - Condition Numbers
- SciPy - Linear Least Squares
- SciPy - Non-Linear Least Squares
- SciPy - Finding Roots of Scalar Functions
- SciPy - Finding Roots of Multivariate Functions
- SciPy - Signal Processing
- SciPy - Signal Filtering & Smoothing
- SciPy - Short-Time Fourier Transform
- SciPy - Wavelet Transform
- SciPy - Continuous Wavelet Transform
- SciPy - Discrete Wavelet Transform
- SciPy - Wavelet Packet Transform
- SciPy - Multi-Resolution Analysis
- SciPy - Stationary Wavelet Transform
- SciPy - Statistical Functions
- SciPy - Stats
- SciPy - Descriptive Statistics
- SciPy - Continuous Probability Distributions
- SciPy - Discrete Probability Distributions
- SciPy - Statistical Tests & Inference
- SciPy - Generating Random Samples
- SciPy - Kaplan-Meier Estimator Survival Analysis
- SciPy - Cox Proportional Hazards Model Survival Analysis
- SciPy Spatial Data
- SciPy - Spatial
- SciPy - Special Functions
- SciPy - Special Package
- SciPy Advanced Topics
- SciPy - CSGraph
- SciPy - ODR
- SciPy Useful Resources
- SciPy - Reference
- SciPy - Quick Guide
- SciPy - Cheatsheet
- SciPy - Useful Resources
- SciPy - Discussion
SciPy - Distance Metrics
What are Distance Metrics?
In SciPy library distance metrics are crucial for measuring similarity or dissimilarity between two points in a given space. These metrics are widely used in fields such as machine learning, data analysis and clustering for tasks such as classification, clustering and nearest neighbor searches.
The scipy.spatial.distance module offers a variety of these metrics such as Euclidean, Manhattan, Cosine and Hamming distances, among others. Each metric serves different purposes for helping to determine the relationships and structures within datasets.
Types of Distance Metrics
As we know that the scipy.spatial.distance module provides a wide range of distance metrics with serving a different purpose as per the requirement. Below are the different Distance Metrics available in Scipy −
Euclidean Distance
In SciPy Euclidean distance is a measure of the straight-line distance between two points in Euclidean space. It is commonly used to quantify the similarity between two vectors by calculating the length of the shortest path connecting them.
The scipy.spatial.distance.euclidean() function is used to calculate the Euclidean Distance in Scipy.
Mathematically, it is defined as the square root of the sum of the squared differences between corresponding components of the two vectors. The formula is given as follows −
 
Where −
- x = (x1, x2, ....., xn) and y = (y1, y2,....., yn) − are the vectors representing the points in the space.
- (xi, yi) − is the difference between the x and y
Syntax
Following is the syntax of scipy.spatial.distance.euclidean() function −
scipy.spatial.distance.euclidean(u, v)
Parameters
Here are the Parameters of the scipy.spatial.distance.euclidean() function −
- u: The first point or vector in n-dimensional space.
- v: The second point or vector in n-dimensional space.
Return Value
This function returns the Euclidean distance between the points u and v.
Example
Following is a simple example showing how to compute the Euclidean distance between two points using SciPy's euclidean() function −
from scipy.spatial.distance import euclidean
# Define two points in 2D space
point1 = [1, 2]
point2 = [4, 6]
# Calculate the Euclidean distance between the two points
distance = euclidean(point1, point2)
print(f"Euclidean Distance: {distance}")
Following is the output of the Euclidean Distance calculated for two points −
Euclidean Distance: 5.0
Manhattan Distance
Manhattan Distance is also known as City-block Distance or L1 Norm which is a metric used to measure the distance between two points in a grid-like path. This is similar to how one would navigate a city grid.
Unlike Euclidean distance which measures the straight-line distance where as Manhattan distance calculates the total distance traveled along the grid lines.
Mathematically, the formula for calculating the Manhattan Distance −
 
Where −
- x = (x1,x2,.....,xn) and y = (y1,y2,.....,yn) − are the vectors representing the points.
- |xi, yi| − is the absolute difference between the x and y.
Syntax
Following is the syntax of scipy.spatial.distance.cityblock() function −
scipy.spatial.distance.cityblock(u, v)
Parameters
Here are the Parameters of the scipy.spatial.distance.cityblock() function −
- u: The first point or vector.
- v: The second point or vector.
Return Value
This function returns the City block distance between the vectors u and v.
Example
Here is the example which calculates the Manhattan Distance with the help of Scipy cityblock() function −
from scipy.spatial.distance import cityblock
# Define two vectors
vector1 = [1, 2, 3]
vector2 = [4, 6, 8]
# Calculate the City Block distance
distance = cityblock(vector1, vector2)
print(f"City Block Distance: {distance}")
Following is the output of the Cityblock Distance calculated for two points −
City Block Distance: 12
Minkowski Distance
Minkowski Distance is a generalization of both Euclidean and Manhattan distances and is used to measure the distance between two points in a normed vector space.
It provides a flexible framework by introducing a parameter p which determines the specific distance metric being used. Mathematically, the formula for calculating the Manhattan Distance −
 
Where −
- x = (x1, x2 ,....., xn) and y = (y1, y2,....., yn) − are the vectors representing the points.
- |xi, yi| − is the absolute difference between the x and y.
- p − is a parameter that defines the distance metric.
Syntax
Following is the syntax of scipy.spatial.distance.minkowski() function −
scipy.spatial.distance.minkowski(u, v, p=2)
Parameters
Here are the Parameters of the scipy.spatial.distance.minkowski() function −
- u: The first point or vector which is an array of coordinates.
- v: The second point or vector which is an array of coordinates.
- p(float, optional): The power parameter for the Minkowski distance. Default is 2.
Note that,
When p = 1, it calculates the Manhattan Distance.
When p = 2, it calculates the Euclidean Distance.
When values of p > 2 measures a more general Minkowski distance.
Return Value
This function returns the Minkowski distance between the two points.
Example
Below is the example of finding the Minkowski distance between two points with the help of minkowski() function −
from scipy.spatial.distance import minkowski
# Define two points in 2D space
point1 = [1, 2]
point2 = [4, 6]
# Calculate Minkowski distance with p=3
distance = minkowski(point1, point2, p=3)
print(f"Minkowski Distance (p=3): {distance}")
Following is the output of the Minkowski Distance calculated for two points −
Minkowski Distance (p=3): 4.497941445275415
Chebyshev Distance
Chebyshev Distance is also known as the Maximum Metric or L Norm which is a distance metric used to measure the distance between two points in a grid-like system.
It is defined as the greatest of the absolute differences along any coordinate dimension. Mathematically the formula for calculating the Chebyshev Distance −
 
Where −
- x = (x1,x2,.....,xn) and y = (y1,y2,.....,yn) − are the vectors representing the points.
- |xi,yi| − is the absolute difference between the x and y.
Syntax
Following is the syntax of scipy.spatial.distance.chebyshev() function −
scipy.spatial.distance.chebyshev(u, v)
Parameters
Here are the Parameters of the scipy.spatial.distance.chebyshev() function −
- u: An array-like object representing the first point in the space.
- v: An array-like object representing the second point in the space.
Return Value
This function returns the Chebyshev distance between the two points u and v.
Example
Below is the example of finding the Chebyshev distance between two points with the help of Chebyshev() function −
from scipy.spatial.distance import chebyshev
# Define two points
point1 = [1, 2]
point2 = [4, 6]
# Calculate the Chebyshev distance
distance = chebyshev(point1, point2)
print(f"Chebyshev Distance: {distance}")
Following is the output of the Chebyshev Distance calculated for two points −
Chebyshev Distance: 4
Cosine Distance
Cosine Distance is a measure of dissimilarity between two vectors based on the angle between them. It quantifies how different the vectors are by calculating the cosine of the angle between them with the distance being derived from this similarity measure.
It is often used in text analysis and clustering when the magnitude of the vectors is less important than their orientation. Mathematically the formula for calculating the Cosine Distance −
 
Syntax
Following is the syntax of scipy.spatial.distance.cosine() function −
scipy.spatial.distance.cosine(u, v)
Parameters
Here are the Parameters of the scipy.spatial.distance.cosine() function −
- u: An array-like object representing the first vector.
- v: An array-like object representing the second vector.
Return Value
This function returns the Cosine distance between the two points u and v.
Example
Below is the example of finding the Cosine distance between two points with the help of Cosine() function −
from scipy.spatial.distance import cosine
# Example vectors
vector1 = [1, 0, 1]
vector2 = [0, 1, 1]
# Compute Cosine distance
distance = cosine(vector1, vector2)
print(f"Cosine Distance: {distance}")
Following is the output of the Cosine Distance calculated for two points −
Cosine Distance: 0.5
Hamming Distance
Hamming Distance is a measure of dissimilarity between two strings or binary vectors of equal length. It quantifies the number of positions at which the corresponding elements differ.
It is often used in error detection and correction algorithms as well as in various applications involving binary data.
A Hamming distance of 0 indicates that the vectors are identical while a distance closer to 1 indicates more dissimilarity. Mathematically the formula for calculating the Hamming Distance −
 
Syntax
Following is the syntax of scipy.spatial.distance.hamming() function −
scipy.spatial.distance.hamming(u, v)
Parameters
Here are the Parameters of the scipy.spatial.distance.hamming() function −
- u: An array-like object or list representing the first vector or string.
- v: An array-like object or list representing the second vector or string.
Return Value
This function returns the Hamming distance between the two points u and v.
Example
In this example the Hamming distance represents the fraction of positions where the two binary vectors differ −
from scipy.spatial.distance import hamming
# Example binary vectors
vector1 = [1, 0, 1, 0, 1]
vector2 = [1, 1, 0, 0, 1]
# Compute Hamming distance
distance = hamming(vector1, vector2)
print(f"Hamming Distance: {distance}")
Below is the output of the Hamming Distance calculated for two points −
Hamming Distance: 0.4
Jaccard Distance
Jaccard Distance is a measure of dissimilarity between two sets. It is calculated as one minus the Jaccard similarity coefficient which is the ratio of the size of the intersection of the sets to the size of their union.
Jaccard distance is often used in binary or categorical data analysis which is particularly in fields like clustering and classification.
In SciPy library the Jaccard distance can be computed using the scipy.spatial.distance.jaccard() function. Mathematically the formula for calculating the Jaccard Distance −
 
Where −
- |u∩v|: is the size of the intersection of the two sets.
- |u∪ ∪ v|: is the size of the union of the two sets.
Syntax
Following is the syntax of scipy.spatial.distance.jaccard() function −
scipy.spatial.distance.jaccard(u, v)
Parameters
Here are the Parameters of the scipy.spatial.distance.jaccard() function −
- u: An array-like object representing the first binary vector or set.
- v: An array-like object representing the second binary vector or set.
Return Value
This function returns the Jaccard distance between the two points u and v.
Example
Following is the example of using the jaccard() function to calculate the Jaccard Distance in SciPy −
from scipy.spatial.distance import jaccard
# Example binary vectors
vector1 = [1, 0, 1, 0, 1, 1]
vector2 = [0, 1, 1, 0, 1, 0]
# Compute Jaccard distance
distance = jaccard(vector1, vector2)
print(f"Jaccard Distance: {distance}")
Following is the output of the Jaccard Distance calculated for two points −
Jaccard Distance: 0.6
Canberra Distance
Canberra Distance is a metric that measures the dissimilarity between two points by summing the absolute differences between their coordinates and normalized by the sum of their absolute values.
- It is particularly sensitive to differences when both coordinates are small by making it useful for cases where values can be zero or near-zero.
- The Canberra distance is often used in various fields such as environmental science and economics where proportional differences are more significant than absolute differences.
Mathematically the formula for calculating the Canberra Distance is given as follows −
 
- |ui-vi| − is the absolute difference between the u and v.
- |ui|+|vi| − is the sum of the absolute values of the th coordinates.
Syntax
Following is the syntax of scipy.spatial.distance.canberra() function −
scipy.spatial.distance.canberra(u, v)
Parameters
Here are the Parameters of the scipy.spatial.distance.canberra() function −
- u: An array-like object representing the first vector.
- v: An array-like object representing the second vector.
Return Value
This function returns the Canberra distance between the two points u and v.
Example
Following is the example of using the canberra() function to calculate the Canberra Distance in SciPy −
from scipy.spatial.distance import canberra
# Example vectors
vector1 = [10, 20, 30]
vector2 = [15, 24, 36]
# Compute Canberra distance
distance = canberra(vector1, vector2)
print(f"Canberra Distance: {distance}")
Below is the output of the Canberra Distance calculated for two points −
Canberra Distance: 0.38181818181818183
Bray-Curtis Distance
Bray-Curtis Distance is a measure of dissimilarity between two non-negative numerical vectors which often used in ecology and biology for comparing species abundances.
It quantifies the difference between two samples by taking into account the magnitude of their elements by making it particularly useful for datasets where the absolute differences are more important than their relative differences.
In SciPy the Bray-Curtis distance can be calculated using the scipy.spatial.distance.braycurtis() function.
Mathematically the formula for calculating the Canberra Distance is given as follows −
 
Where −
- |ui-vi| − is the absolute difference between the corresponding elements of vectors u and v.
- ui+vi − is the sum of the corresponding elements.
Syntax
Following is the syntax of scipy.spatial.distance.braycurtis() function −
scipy.spatial.distance.braycurtis(u, v)
Parameters
Here are the Parameters of the scipy.spatial.distance.braycurtis() function −
- u: An array-like object representing the first vector.
- v: An array-like object representing the second vector.
Return Value
This function returns the Bray-Curtis distance between the two points u and v.
Example
Here is the example of using the braycurtis() function to calculate the Bray-Curtis Distance in SciPy −
from scipy.spatial.distance import braycurtis
# Example vectors
vector1 = [1, 3, 5, 7]
vector2 = [2, 4, 6, 8]
# Compute Bray-Curtis distance
distance = braycurtis(vector1, vector2)
print(f"Bray-Curtis Distance: {distance}")
Below is the output of the Canberra Distance calculated for two points −
Bray-Distance: 0.1111111111111111