SciPy - linkage() Method



The SciPy linkage() method works on hierarchical cluster which can be used to perform the task of linkage matrix. This matrix provide the code structure of matrix data.

The hierarchical cluster is defined by separating data into group based. Following are two uses in data analysis −

  • Identifying Natural Groupings: It is used to identify the grouping items with the help of natural division.
  • Dendrogram Construction: This create a dendogram, it is a type of tree-diagram which records splitting sequence of data or heirarchical structure.
This method is commonly used in data analysis for grouping the similar items into clusters that can help us to understand the structure of data and make prediction.

Syntax

Following is the syntax of the SciPy linkage() method −

linkage(data, method = 'single')
or,
linkage(data, method = 'single', metric = 'type')

Parameters

This method accepts the following parameters −

  • data: This parameter define the list of data elements in array forms.
  • method = 'single': This parameter define the type of linkage algorithm.
  • metric = 'type': The default type is 'euclidean'.

Return value

This method returns the linkage matrix which is a shape of numpy array(n-1, 4) where n defines the number of observation.

Example 1

Following is the SciPy linkage() method operates the linkage clustering on a custom dataset and plots the dendogram to visualize the process of data clustering.

import numpy as np
from scipy.cluster.hierarchy import linkage, dendrogram
import matplotlib.pyplot as plt

# Sample data
data = np.array([[1, 2], [2, 3], [5, 8], [8, 8]])

# Compute the linkage matrix using single linkage
result = linkage(data, method='single')

# Plot the dendrogram
plt.figure(figsize=(8, 4))
dendrogram(result)
plt.title('Dendrogram - Single Linkage')
plt.xlabel('Sample index')
plt.ylabel('Distance')
plt.show()

Output

The above code produces the following result −

scipy_linkage_method_one

Example 2

Here, we demonstrates the complete linkage clustering on a dataset with six observations and plots the dendrogram using the Euclidean distance metric.

import numpy as np
from scipy.cluster.hierarchy import linkage, dendrogram
import matplotlib.pyplot as plt

# Sample data
data = np.array([[1, 2], [2, 3], [5, 8], [8, 8], [1, 0], [2, 1]])

# Compute the linkage matrix using complete linkage and Euclidean distance
result = linkage(data, method='complete', metric='euclidean')

# Plot the dendrogram
plt.figure(figsize=(8, 4))
dendrogram(result)
plt.title('Dendrogram - Complete Linkage')
plt.xlabel('Sample index')
plt.ylabel('Distance')
plt.show()

Output

The above code produces the following result −

scipy_linkage_method_two

Example 3

Below the example perform the average linkage clustering using Manhattan distance on the same dataset as Example 2. Here, it use the metric type as 'cityblock'.

Note that, the Manhattan distance is measured using two points axes at right angles. It is used in high-dimensional datasets.

import numpy as np
from scipy.cluster.hierarchy import linkage, dendrogram
import matplotlib.pyplot as plt

# Sample data
data = np.array([[1, 2], [2, 3], [5, 8], [8, 8], [1, 0], [2, 1]])

# Compute the linkage matrix using average linkage and Manhattan distance
result = linkage(data, method='average', metric='cityblock')

# Plot the dendrogram
plt.figure(figsize=(8, 4))
dendrogram(result)
plt.title('Dendrogram - Average Linkage with Manhattan Distance')
plt.xlabel('Sample index')
plt.ylabel('Distance')
plt.show()

Output

The above code produces the following result −

scipy_linkage_method_three
scipy_reference.htm
Advertisements