Last modified: Dec 18, 2024 By Alexander Williams
Python Seaborn Clustermap: Create Hierarchical Heatmaps
Seaborn's clustermap()
function is a powerful tool for creating hierarchically clustered heatmaps, combining the visualization of data matrices with dendrograms showing the hierarchical relationships.
Understanding Clustermaps
A clustermap is an enhanced version of a heatmap that includes hierarchical clustering on both rows and columns, making it invaluable for identifying patterns and relationships in complex datasets.
Similar to how Seaborn's relplot helps visualize relationships, clustermaps excel at revealing patterns in correlation matrices and other tabular data.
Basic Clustermap Implementation
Let's start with a basic example using a correlation matrix:
import seaborn as sns
import pandas as pd
import numpy as np
# Create sample data
np.random.seed(0)
data = np.random.rand(10, 10)
df = pd.DataFrame(data)
# Create clustermap
g = sns.clustermap(df)
Customizing Clustermaps
You can customize your clustermap using various parameters to enhance visualization:
# Create a more customized clustermap
g = sns.clustermap(df,
cmap='coolwarm', # Color scheme
annot=True, # Show values
fmt='.2f', # Format for values
figsize=(10, 10), # Figure size
dendrogram_ratio=(.1, .2)) # Adjust dendrogram size
Working with Real Data
Let's use a more practical example with the built-in flights dataset, which can help demonstrate patterns in seasonal flight traffic:
# Load flights dataset
flights = sns.load_dataset("flights")
flights_pivot = flights.pivot(index='month', columns='year', values='passengers')
# Create clustermap with customization
g = sns.clustermap(flights_pivot,
cmap='YlOrRd',
figsize=(12, 8),
row_cluster=True,
col_cluster=True)
Advanced Clustering Options
You can control the clustering method and distance metrics used in your visualization:
from scipy.spatial.distance import pdist
from scipy.cluster.hierarchy import linkage
# Custom linkage matrix
row_linkage = linkage(pdist(flights_pivot), method='average')
# Create clustermap with custom linkage
g = sns.clustermap(flights_pivot,
row_linkage=row_linkage,
standard_scale=1, # Scale the data
z_score=None) # Alternative scaling
Color Scaling and Normalization
Just like in histogram visualizations, proper scaling can significantly improve the readability of your clustermap:
# Create clustermap with different normalization
g = sns.clustermap(flights_pivot,
cmap='viridis',
standard_scale=0, # Scale rows
center=0, # Center colormap at 0
robust=True) # Use robust quantiles
Adding Row and Column Colors
You can enhance your visualization by adding color bars to indicate categories:
# Create row colors
seasons = {'Jan': 'winter', 'Feb': 'winter', 'Mar': 'spring',
'Apr': 'spring', 'May': 'spring', 'Jun': 'summer',
'Jul': 'summer', 'Aug': 'summer', 'Sep': 'fall',
'Oct': 'fall', 'Nov': 'fall', 'Dec': 'winter'}
row_colors = pd.Series(seasons).map({'winter': 'blue', 'spring': 'green',
'summer': 'red', 'fall': 'orange'})
# Create clustermap with row colors
g = sns.clustermap(flights_pivot,
row_colors=row_colors,
figsize=(10, 10))
Handling Missing Data
Clustermaps can handle missing data gracefully:
# Create data with missing values
flights_missing = flights_pivot.copy()
flights_missing.iloc[2:4, 3:5] = np.nan
# Create clustermap with missing data
g = sns.clustermap(flights_missing,
cmap='YlOrRd',
mask=flights_missing.isnull()) # Mask missing values
Conclusion
Seaborn's clustermap()
is a versatile tool for creating hierarchically clustered heatmaps, perfect for exploring patterns in complex datasets and correlation matrices.
Like pairplot visualization, clustermaps provide valuable insights into data relationships, but with the added benefit of hierarchical clustering.
Remember to consider your data structure and visualization goals when customizing parameters to create the most effective and informative clustermaps for your analysis.