Last modified: Dec 18, 2024 By Alexander Williams

Python Seaborn Clustermap: Create Hierarchical Heatmaps

Seaborn's clustermap() function is a powerful tool for creating hierarchically clustered heatmaps, combining the visualization of data matrices with dendrograms showing the hierarchical relationships.

Understanding Clustermaps

A clustermap is an enhanced version of a heatmap that includes hierarchical clustering on both rows and columns, making it invaluable for identifying patterns and relationships in complex datasets.

Similar to how Seaborn's relplot helps visualize relationships, clustermaps excel at revealing patterns in correlation matrices and other tabular data.

Basic Clustermap Implementation

Let's start with a basic example using a correlation matrix:


import seaborn as sns
import pandas as pd
import numpy as np

# Create sample data
np.random.seed(0)
data = np.random.rand(10, 10)
df = pd.DataFrame(data)

# Create clustermap
g = sns.clustermap(df)

Customizing Clustermaps

You can customize your clustermap using various parameters to enhance visualization:


# Create a more customized clustermap
g = sns.clustermap(df,
                   cmap='coolwarm',  # Color scheme
                   annot=True,       # Show values
                   fmt='.2f',        # Format for values
                   figsize=(10, 10), # Figure size
                   dendrogram_ratio=(.1, .2))  # Adjust dendrogram size

Working with Real Data

Let's use a more practical example with the built-in flights dataset, which can help demonstrate patterns in seasonal flight traffic:


# Load flights dataset
flights = sns.load_dataset("flights")
flights_pivot = flights.pivot(index='month', columns='year', values='passengers')

# Create clustermap with customization
g = sns.clustermap(flights_pivot,
                   cmap='YlOrRd',
                   figsize=(12, 8),
                   row_cluster=True,
                   col_cluster=True)

Advanced Clustering Options

You can control the clustering method and distance metrics used in your visualization:


from scipy.spatial.distance import pdist
from scipy.cluster.hierarchy import linkage

# Custom linkage matrix
row_linkage = linkage(pdist(flights_pivot), method='average')

# Create clustermap with custom linkage
g = sns.clustermap(flights_pivot,
                   row_linkage=row_linkage,
                   standard_scale=1,  # Scale the data
                   z_score=None)      # Alternative scaling

Color Scaling and Normalization

Just like in histogram visualizations, proper scaling can significantly improve the readability of your clustermap:


# Create clustermap with different normalization
g = sns.clustermap(flights_pivot,
                   cmap='viridis',
                   standard_scale=0,    # Scale rows
                   center=0,            # Center colormap at 0
                   robust=True)         # Use robust quantiles

Adding Row and Column Colors

You can enhance your visualization by adding color bars to indicate categories:


# Create row colors
seasons = {'Jan': 'winter', 'Feb': 'winter', 'Mar': 'spring',
           'Apr': 'spring', 'May': 'spring', 'Jun': 'summer',
           'Jul': 'summer', 'Aug': 'summer', 'Sep': 'fall',
           'Oct': 'fall', 'Nov': 'fall', 'Dec': 'winter'}

row_colors = pd.Series(seasons).map({'winter': 'blue', 'spring': 'green',
                                    'summer': 'red', 'fall': 'orange'})

# Create clustermap with row colors
g = sns.clustermap(flights_pivot,
                   row_colors=row_colors,
                   figsize=(10, 10))

Handling Missing Data

Clustermaps can handle missing data gracefully:


# Create data with missing values
flights_missing = flights_pivot.copy()
flights_missing.iloc[2:4, 3:5] = np.nan

# Create clustermap with missing data
g = sns.clustermap(flights_missing,
                   cmap='YlOrRd',
                   mask=flights_missing.isnull())  # Mask missing values

Conclusion

Seaborn's clustermap() is a versatile tool for creating hierarchically clustered heatmaps, perfect for exploring patterns in complex datasets and correlation matrices.

Like pairplot visualization, clustermaps provide valuable insights into data relationships, but with the added benefit of hierarchical clustering.

Remember to consider your data structure and visualization goals when customizing parameters to create the most effective and informative clustermaps for your analysis.