Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to give sns.clustermap a precomputed distance matrix in Matplotlib?
To use sns.clustermap with a precomputed distance matrix in Matplotlib, you need to pass your distance matrix to the row_linkage and col_linkage parameters. This allows you to provide custom clustering results instead of letting seaborn compute distances automatically.
Creating a Basic Clustermap with Default Distance
First, let's see how clustermap() works with default distance calculation ?
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from scipy.cluster.hierarchy import linkage
from scipy.spatial.distance import pdist
plt.rcParams["figure.figsize"] = [10, 6]
sns.set_theme(color_codes=True)
# Load example dataset
iris = sns.load_dataset("iris")
species = iris.pop("species")
# Create basic clustermap
g = sns.clustermap(iris, figsize=(8, 6))
plt.show()
Using Precomputed Distance Matrix
To use a precomputed distance matrix, calculate distances first and create linkage matrices ?
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from scipy.cluster.hierarchy import linkage
from scipy.spatial.distance import pdist
# Load and prepare data
iris = sns.load_dataset("iris")
species = iris.pop("species")
iris_data = iris.values
# Compute distance matrices
row_distances = pdist(iris_data, metric='euclidean')
col_distances = pdist(iris_data.T, metric='euclidean')
# Create linkage matrices from precomputed distances
row_linkage = linkage(row_distances, method='ward')
col_linkage = linkage(col_distances, method='ward')
# Create clustermap with precomputed linkages
g = sns.clustermap(iris,
row_linkage=row_linkage,
col_linkage=col_linkage,
figsize=(8, 6))
plt.show()
Using Different Distance Metrics
You can use various distance metrics for clustering ?
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.cluster.hierarchy import linkage
from scipy.spatial.distance import pdist
# Load data
iris = sns.load_dataset("iris")
species = iris.pop("species")
# Use Manhattan distance
row_distances = pdist(iris.values, metric='manhattan')
row_linkage = linkage(row_distances, method='complete')
# Create clustermap with Manhattan distance
g = sns.clustermap(iris,
row_linkage=row_linkage,
col_linkage=None, # Use default for columns
figsize=(8, 6),
cmap='viridis')
plt.show()
Key Parameters
| Parameter | Description | Example Value |
|---|---|---|
row_linkage |
Precomputed linkage matrix for rows | linkage(distances, 'ward') |
col_linkage |
Precomputed linkage matrix for columns | linkage(distances, 'complete') |
metric |
Distance metric to use | 'euclidean', 'manhattan', 'cosine' |
Conclusion
Use precomputed distance matrices with sns.clustermap by passing linkage matrices to row_linkage and col_linkage parameters. This gives you full control over the clustering algorithm and distance metrics used.
