Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to plot the difference of two distributions in Matplotlib?
To plot the difference between two distributions in Matplotlib, we use kernel density estimation (KDE) to create smooth probability density functions from our data, then visualize both distributions and their difference.
Understanding Kernel Density Estimation
Kernel Density Estimation creates a continuous probability density function from discrete data points using Gaussian kernels. This allows us to compare distributions smoothly.
Complete Example
Here's how to plot two distributions and their difference ?
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats
# Set figure size
plt.rcParams["figure.figsize"] = [10, 6]
plt.rcParams["figure.autolayout"] = True
# Create two different datasets
a = np.random.gumbel(50, 28, 100)
b = np.random.gumbel(60, 37, 100)
# Create kernel density estimates
kdea = scipy.stats.gaussian_kde(a)
kdeb = scipy.stats.gaussian_kde(b)
# Create evaluation grid
grid = np.linspace(0, 200, 200)
# Plot the distributions and their difference
plt.plot(grid, kdea(grid), label="Distribution A", linewidth=2)
plt.plot(grid, kdeb(grid), label="Distribution B", linewidth=2)
plt.plot(grid, kdea(grid) - kdeb(grid), label="Difference (A - B)", linewidth=2, linestyle='--')
# Add horizontal line at y=0 for reference
plt.axhline(y=0, color='black', linestyle='-', alpha=0.3)
# Customize the plot
plt.xlabel('Value')
plt.ylabel('Density')
plt.title('Comparison of Two Distributions')
plt.legend(loc='upper right')
plt.grid(True, alpha=0.3)
plt.show()
[A plot showing two KDE curves and their difference, with Distribution A and B as smooth curves and their difference as a dashed line crossing the zero reference line]
Key Steps Breakdown
Generate Data: Create two different datasets using
np.random.gumbel()Create KDE Objects: Use
scipy.stats.gaussian_kde()to estimate probability densityDefine Grid: Create evaluation points using
np.linspace()Calculate Difference: Subtract one KDE from another:
kdea(grid) - kdeb(grid)Visualize: Plot all three curves with proper labels and styling
Interpreting the Results
The difference plot shows where one distribution has higher density than the other. Positive values indicate Distribution A has higher density, while negative values show Distribution B dominates that region.
Conclusion
Use scipy.stats.gaussian_kde() to create smooth density estimates from your data. Plot the original distributions and their difference to clearly visualize where they diverge. The difference curve helps identify regions where one distribution is more probable than the other.
