Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to plot cdf in Matplotlib in Python?
A Cumulative Distribution Function (CDF) shows the probability that a random variable takes a value less than or equal to a given value. In matplotlib, we can plot a CDF by calculating cumulative probabilities from histogram data.
Steps to Plot CDF
To plot a CDF in matplotlib, follow these steps:
Generate or prepare your sample data
Create a histogram to get frequency counts and bin edges
Calculate the probability density function (PDF) by normalizing counts
Compute the CDF using cumulative sum of PDF values
Plot the CDF using matplotlib's
plot()method
Example
Here's how to plot a CDF from normally distributed random data:
import numpy as np
import matplotlib.pyplot as plt
# Set figure size
plt.figure(figsize=(8, 5))
# Generate sample data
N = 500
data = np.random.randn(N)
# Create histogram
count, bins_count = np.histogram(data, bins=20)
# Calculate PDF (probability density function)
pdf = count / sum(count)
# Calculate CDF (cumulative distribution function)
cdf = np.cumsum(pdf)
# Plot CDF
plt.plot(bins_count[1:], cdf, marker='o', linestyle='-', label="CDF")
plt.xlabel('Value')
plt.ylabel('Cumulative Probability')
plt.title('Cumulative Distribution Function')
plt.grid(True, alpha=0.3)
plt.legend()
plt.show()
Alternative Method Using scipy.stats
For a smoother CDF, you can use scipy's built-in functions:
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
# Generate sample data
data = np.random.normal(0, 1, 1000)
# Sort data for plotting
sorted_data = np.sort(data)
y_values = np.arange(1, len(sorted_data) + 1) / len(sorted_data)
# Plot empirical CDF
plt.figure(figsize=(8, 5))
plt.plot(sorted_data, y_values, label='Empirical CDF', linewidth=2)
plt.xlabel('Value')
plt.ylabel('Cumulative Probability')
plt.title('Empirical Cumulative Distribution Function')
plt.grid(True, alpha=0.3)
plt.legend()
plt.show()
Key Points
CDF values range from 0 to 1
The CDF is always non-decreasing
More bins in the histogram provide smoother CDF curves
Use
np.cumsum()to calculate cumulative probabilities
Conclusion
Plotting a CDF in matplotlib involves creating a histogram, calculating the PDF, and then computing cumulative probabilities. This visualization helps understand the distribution of your data and probability thresholds.
