SciPy - average() Method



The SciPy average() method is used to perform the task of arithmetic mean on a distance matrix. In data analysis, this method helps us to create a hierarchy of clusters from data points.

This method refers the distance between two clusters as the average distance between all pairs of data points, where one point is from the first cluster and the other is from the second cluster.

Syntax

Following is the syntax of the SciPy average() method −

average(y)

Parameters

This method accepts a single parameter −

  • y: This parameter store the distance of array matrix.

Return value

This method returns the linkage matrix(result).

Example 1

Following is the SciPy average() method to perform the task of distance matrix.

import numpy as np
from scipy.cluster.hierarchy import average, dendrogram
import matplotlib.pyplot as plt

# Distance matrix
y = np.array([0.6, 0.2, 0.3, 0.5, 0.4, 0.8])

# Perform average linkage clustering
result = average(y)

# Plot the dendrogram
plt.figure(figsize=(6, 4))
dendrogram(result)
plt.title('Dendrogram - Average Linkage')
plt.xlabel('indexes')
plt.ylabel('Distance')
plt.show()

Output

The above code produces the following result −

scipy_complete_method_one

Example 2

Below the example operate the task of average linkage clustering on random dataset.

import numpy as np
from scipy.spatial.distance import pdist
from scipy.cluster.hierarchy import average, dendrogram
import matplotlib.pyplot as plt

# generate random data
data = np.random.rand(4, 2)

# calculate the distance matrix
y = pdist(data, metric='euclidean')

# average linkage clustering
result = average(result)

# plot the dendrogram
plt.figure(figsize=(6, 4))
dendrogram(Z)
plt.title('Dendrogram - Average Linkage on Random Data')
plt.xlabel('indexes')
plt.ylabel('Distance')
plt.show()

Output

The above code produces the following result −

scipy_complete_method_two

Example 3

To obtain the average clustering linkage, it use dendrogram() to visualize the data and generate the expected outcome. Here, we mention the metric type as 'cityblock'.

import numpy as np
from scipy.spatial.distance import pdist
from scipy.cluster.hierarchy import average, dendrogram
import matplotlib.pyplot as plt

# sample data
data = np.array([[1, 5], [2, 4], [3, 6], [4, 8]])

# calculate the distance matrix using a custom metric
y = pdist(data, metric='cityblock')

# average linkage clustering
result = average(y)

# Plot the dendrogram
plt.figure(figsize=(6, 4))
dendrogram(result)
plt.title('Dendrogram - Average Linkage with Cityblock Distance')
plt.xlabel('indexes')
plt.ylabel('Distance')
plt.show()

Output

The above code produces the following result −

scipy_complete_method_three
scipy_reference.htm
Advertisements