Article Categories

Selected Reading

How to plot the difference of two distributions in Matplotlib?

Matplotlib Python Data Visualization

To plot the difference between two distributions in Matplotlib, we use kernel density estimation (KDE) to create smooth probability density functions from our data, then visualize both distributions and their difference.

Understanding Kernel Density Estimation

Kernel Density Estimation creates a continuous probability density function from discrete data points using Gaussian kernels. This allows us to compare distributions smoothly.

Complete Example

Here's how to plot two distributions and their difference ?

import numpy as np
import matplotlib.pyplot as plt
import scipy.stats

# Set figure size
plt.rcParams["figure.figsize"] = [10, 6]
plt.rcParams["figure.autolayout"] = True

# Create two different datasets
a = np.random.gumbel(50, 28, 100)
b = np.random.gumbel(60, 37, 100)

# Create kernel density estimates
kdea = scipy.stats.gaussian_kde(a)
kdeb = scipy.stats.gaussian_kde(b)

# Create evaluation grid
grid = np.linspace(0, 200, 200)

# Plot the distributions and their difference
plt.plot(grid, kdea(grid), label="Distribution A", linewidth=2)
plt.plot(grid, kdeb(grid), label="Distribution B", linewidth=2)
plt.plot(grid, kdea(grid) - kdeb(grid), label="Difference (A - B)", linewidth=2, linestyle='--')

# Add horizontal line at y=0 for reference
plt.axhline(y=0, color='black', linestyle='-', alpha=0.3)

# Customize the plot
plt.xlabel('Value')
plt.ylabel('Density')
plt.title('Comparison of Two Distributions')
plt.legend(loc='upper right')
plt.grid(True, alpha=0.3)

plt.show()

[A plot showing two KDE curves and their difference, with Distribution A and B as smooth curves and their difference as a dashed line crossing the zero reference line]

Key Steps Breakdown

Generate Data: Create two different datasets using np.random.gumbel()
Create KDE Objects: Use scipy.stats.gaussian_kde() to estimate probability density
Define Grid: Create evaluation points using np.linspace()
Calculate Difference: Subtract one KDE from another: kdea(grid) - kdeb(grid)
Visualize: Plot all three curves with proper labels and styling

Interpreting the Results

The difference plot shows where one distribution has higher density than the other. Positive values indicate Distribution A has higher density, while negative values show Distribution B dominates that region.

Conclusion

Use scipy.stats.gaussian_kde() to create smooth density estimates from your data. Plot the original distributions and their difference to clearly visualize where they diverge. The difference curve helps identify regions where one distribution is more probable than the other.

Rishikesh Kumar Rishi

Updated on: 2026-03-26T00:25:17+05:30

2K+ Views

Previous Next