Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to adjust the branch lengths of a dendrogram in Matplotlib?
To adjust the branch lengths of a dendrogram in Matplotlib, you need to understand that branch lengths represent the distance between clusters. You can control this by modifying the linkage method, distance metric, or by manipulating the dendrogram parameters.
Basic Dendrogram Creation
First, let's create a simple dendrogram with default settings ?
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
import numpy as np
plt.rcParams["figure.figsize"] = [7.50, 3.50]
plt.rcParams["figure.autolayout"] = True
# Generate sample data
a = np.random.multivariate_normal([0, 10], [[3, 1], [1, 4]], size=[2, ])
b = np.random.multivariate_normal([0, 10], [[3, 1], [1, 4]], size=[3, ])
X = np.concatenate((a, b), )
# Perform hierarchical clustering
Z = linkage(X)
# Create dendrogram
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
dendrogram(Z, ax=ax)
plt.title("Default Dendrogram")
plt.show()
Adjusting Branch Lengths with Different Linkage Methods
Different linkage methods produce different branch lengths based on how distances are calculated ?
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
import numpy as np
# Generate consistent sample data
np.random.seed(42)
X = np.random.rand(10, 2) * 10
# Different linkage methods
methods = ['single', 'complete', 'average', 'ward']
fig, axes = plt.subplots(2, 2, figsize=(12, 8))
fig.suptitle('Branch Lengths with Different Linkage Methods')
for i, method in enumerate(methods):
row, col = i // 2, i % 2
Z = linkage(X, method=method)
dendrogram(Z, ax=axes[row, col])
axes[row, col].set_title(f'{method.capitalize()} Linkage')
plt.tight_layout()
plt.show()
Scaling Branch Lengths
You can scale the dendrogram by manipulating the linkage matrix distances ?
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
import numpy as np
# Generate sample data
np.random.seed(42)
X = np.random.rand(8, 2) * 10
# Original linkage
Z_original = linkage(X, method='ward')
# Scale the distances (branch lengths)
Z_scaled = Z_original.copy()
Z_scaled[:, 2] *= 2 # Double the distances
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
# Original dendrogram
dendrogram(Z_original, ax=ax1)
ax1.set_title('Original Branch Lengths')
# Scaled dendrogram
dendrogram(Z_scaled, ax=ax2)
ax2.set_title('Scaled Branch Lengths (2x)')
plt.tight_layout()
plt.show()
Using Distance Metrics to Control Branch Lengths
Different distance metrics affect the clustering and resulting branch lengths ?
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
from scipy.spatial.distance import pdist
import numpy as np
# Generate sample data
np.random.seed(42)
X = np.random.rand(8, 2) * 10
# Different distance metrics
metrics = ['euclidean', 'manhattan', 'cosine']
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
fig.suptitle('Branch Lengths with Different Distance Metrics')
for i, metric in enumerate(metrics):
distances = pdist(X, metric=metric)
Z = linkage(distances, method='average')
dendrogram(Z, ax=axes[i])
axes[i].set_title(f'{metric.capitalize()} Distance')
plt.tight_layout()
plt.show()
Key Parameters for Branch Length Control
| Parameter | Effect on Branch Lengths | Usage |
|---|---|---|
| Linkage Method | Changes clustering criteria | 'single', 'complete', 'average', 'ward' |
| Distance Metric | Changes distance calculation | 'euclidean', 'manhattan', 'cosine' |
| Manual Scaling | Direct manipulation of distances | Multiply Z[:, 2] by scaling factor |
Conclusion
Branch lengths in dendrograms reflect the distance between clusters and can be adjusted through linkage methods, distance metrics, or direct scaling of the linkage matrix. Choose the appropriate method based on your data characteristics and visualization needs.
