Hierarchically-clustered Heatmap in Python with Seaborn Clustermap


In data analysis and visualization, hierarchically-clustered heatmaps provide a powerful tool to reveal patterns and relationships within complex datasets. This article explores how to create a hierarchically-clustered heatmap using Seaborn Clustermap in Python.

To assist you in comprehending the process, we will walk you through the procedure step-by-step utilizing code examples. We will instruct you on how to cluster and visualize the data, this will provide you with important information regarding the relationship between each variable.

What is a Hierarchically-Clustered Heatmap in Python with Seaborn Clustermap?

A hierarchically-clustered heatmap is a visualization technique used to display a matrix of data in a heatmap format while also incorporating hierarchical clustering. In Python, the Seaborn library provides a useful tool called Clustermap that enables the creation of hierarchically-clustered heatmaps.

Have you ever worked with a large and complex dataset and found it difficult to identify patterns or connections within the data? If so, you're not alone. It can be a daunting task that requires a lot of time and effort. That's the place where hierarchical clusters are involved. This method facilitates the organization of the rows and columns of a heatmap according to their similarities, this will allow us to better comprehend the relationship between different parts of the data.

The outcome is a heatmap that not only looks attractive but also has a significant impact on the data's underlying structure. By combining the rows and columns, we can deduce how they cluster into groups or families of similar objects. This facilitates the identification of trends and connections that are not immediately apparent from the raw data.

Plotting Hierarchically-Clustered Heatmap in Python with Seaborn Clustermap

Below are the steps that we will follow to plot Hierarchically-clustered Heatmap in Python with Seaborn Clustermap −

  • Import the necessary libraries −

    • Import the Seaborn library using `import seaborn as sns`

    • Optionally, import the Matplotlib library for additional customization using `import matplotlib.pyplot as plt`.

  • Load or prepare the dataset −

    • Load the dataset you want to visualize using `sns.load_dataset()` or prepare your own dataset in a suitable format.

  • Preprocess the data (if required) −

    • Perform any necessary data preprocessing steps, such as reshaping or aggregating the data, to create a matrix suitable for the heatmap visualization.

  • Create the clustered heatmap −

    • Use the `sns.clustermap()` function, passing the preprocessed data matrix as the input.

    • Specify any additional parameters to customize the appearance, such as the colormap (`cmap` parameter) or clustering method (`method` parameter).

  • Display the heatmap−

    • Use `plt.show()` to display the heatmap if you imported the Matplotlib library in step 1.

Example

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

# Load the inbuilt dataset
data = sns.load_dataset("flights")

# Data preprocessing
data_pivot = data.pivot("month", "year", "passengers")

# Data analysis
monthly_totals = data.groupby("month")["passengers"].sum()
yearly_totals = data.groupby("year")["passengers"].sum()

# Data processing
processed_data = data_pivot.div(monthly_totals, axis=0)

# Create the clustered heatmap using seaborn clustermap
sns.clustermap(processed_data, cmap="YlGnBu")

# Display the heatmap
plt.show()

Output

Customized Hierarchically-Clustered Heatmap in Python with Seaborn Clustermap

  • We create the hierarchically-clustered heatmap using the clustermap() function from Seaborn, passing the pivot_data matrix as the input.

  • We specify the colormap as "YlGnBu" using the cmap parameter.

  • Additional customization options are provided:

  • linewidths=0.5: Sets the width of the lines in the dendrograms.

  • figsize=(8, 6): Sets the size of the resulting heatmap figure.

  • dendrogram_ratio=(0.1, 0.2): Adjusts the ratio of the height of the dendrograms.

Customize the Heatmap

  • We use standard Matplotlib functions to customize the heatmap further. In this example, we set the title using plt.title(), and label the x-axis and y-axis using plt.xlabel() and plt.ylabel() respectively.

Example

import seaborn as sns

# Load the inbuilt dataset
data = sns.load_dataset("flights")

# Pivot the data to create a matrix for the heatmap
pivot_data = data.pivot("month", "year", "passengers")

# Create the clustered heatmap using seaborn clustermap
sns.clustermap(pivot_data, cmap="YlGnBu", linewidths=0.5, figsize=(8, 6), dendrogram_ratio=(0.1, 0.2))

# Customize the heatmap
plt.title("Hierarchically-clustered Heatmap - Flights Data")
plt.xlabel("Year")
plt.ylabel("Month")

# Display the heatmap
plt.show()

Output

Conclusion

In conclusion, this article explored the creation of hierarchically-clustered heatmaps in Python using the Seaborn Clustermap. By following the outlined steps, one can easily visualize complex datasets and uncover patterns and relationships within the data.

The Seaborn library's clustermap function offers flexibility and customization options, allowing users to adjust the color scheme, linewidths, figsize, and dendrogram ratio according to their preferences.

Updated on: 12-Jul-2023

492 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements