Last modified: Dec 18, 2024 By Alexander Williams

Python Seaborn Distplot: Create Histograms with KDE

Seaborn's distplot() function is a powerful tool for visualizing univariate distributions, combining histograms with kernel density estimation (KDE) curves to provide comprehensive data insights.

Understanding Distplot Basics

While distplot() is deprecated in newer versions of Seaborn, understanding its functionality is crucial as it laid the foundation for modern alternatives like histplot() and kdeplot().

Let's start with a basic example:


import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

# Generate sample data
data = np.random.normal(0, 1, 1000)

# Create basic distplot
sns.distplot(data)
plt.title('Basic Distribution Plot')
plt.show()

Customizing Histogram Components

You can customize various aspects of the histogram to better suit your visualization needs. Here's how to modify the bins and histogram appearance:


# Customize histogram appearance
sns.distplot(data,
            bins=30,           # Number of bins
            hist=True,         # Show histogram
            kde=True,          # Show KDE plot
            color='blue',      # Color of plot
            hist_kws={'alpha': 0.7},  # Histogram transparency
            kde_kws={'linewidth': 2})  # KDE line width

plt.title('Customized Distribution Plot')
plt.show()

Working with KDE Overlay

The Kernel Density Estimation (KDE) overlay provides a smooth estimate of the probability density function. For more detailed density visualization techniques, check out our Python Seaborn KDEplot Tutorial.


# Create distribution plot with different KDE bandwidth
sns.distplot(data,
            kde=True,
            kde_kws={'bw': 0.5},  # Adjust bandwidth
            color='green')
plt.title('Distribution Plot with Modified KDE')
plt.show()

Comparing Multiple Distributions

When analyzing multiple distributions, you might want to explore relationships between variables. Our Python Seaborn Pairplot guide offers more insights into multivariate analysis.


# Generate two different distributions
data1 = np.random.normal(0, 1, 1000)
data2 = np.random.normal(2, 1.5, 1000)

# Plot multiple distributions
plt.figure(figsize=(10, 6))
sns.distplot(data1, label='Distribution 1', color='blue')
sns.distplot(data2, label='Distribution 2', color='red')
plt.legend()
plt.title('Comparing Multiple Distributions')
plt.show()

Advanced Customization Options

For more sophisticated visualizations, you can combine multiple plot elements and customize various aspects:


# Advanced customization
plt.figure(figsize=(12, 6))
sns.distplot(data,
            bins=25,
            hist_kws={'alpha': 0.8,
                     'color': 'skyblue',
                     'edgecolor': 'black'},
            kde_kws={'color': 'darkblue',
                    'linewidth': 2,
                    'label': 'KDE'},
            rug=True,  # Add rug plot
            rug_kws={'color': 'red'})

plt.title('Advanced Distribution Plot')
plt.xlabel('Values')
plt.ylabel('Density')
plt.show()

Modern Alternatives to Distplot

Since distplot() is deprecated, it's recommended to use histplot() and kdeplot() separately or together. For more details, visit our Python Seaborn Histplot Tutorial.


# Modern approach using histplot and kdeplot
plt.figure(figsize=(10, 6))
sns.histplot(data, stat='density', alpha=0.5)
sns.kdeplot(data, color='red')
plt.title('Modern Alternative to Distplot')
plt.show()

Conclusion

While distplot() remains a valuable learning tool, understanding its principles helps transition to modern Seaborn visualization methods. The combination of histograms and KDE provides comprehensive insights into data distributions.

Remember to consider your specific visualization needs when choosing between different plot types and customization options. The key is to create clear, informative, and visually appealing statistical graphics.