Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to normalize a histogram in Python?
To normalize a histogram in Python, we can use the hist() method with the density=True parameter. In a normalized histogram, the area underneath the plot equals 1, making it useful for probability distributions and comparisons.
What is Histogram Normalization?
Histogram normalization scales the bars so that the total area under the histogram equals 1. This converts frequency counts into probability densities, making it easier to compare datasets of different sizes.
Basic Normalization Example
Here's how to create a normalized histogram using matplotlib ?
import matplotlib.pyplot as plt
import numpy as np
# Sample data
data = [1, 2, 2, 3, 3, 3, 4, 4, 5]
# Create normalized histogram
plt.figure(figsize=(8, 5))
n, bins, patches = plt.hist(data, bins=5, density=True, alpha=0.7, color='skyblue')
plt.title('Normalized Histogram')
plt.xlabel('Values')
plt.ylabel('Density')
plt.grid(True, alpha=0.3)
plt.show()
# Print the area under the histogram
bin_width = bins[1] - bins[0]
total_area = sum(n) * bin_width
print(f"Total area under histogram: {total_area:.2f}")
Total area under histogram: 1.00
Comparing Normalized vs Non-Normalized
Let's compare normalized and non-normalized histograms side by side ?
import matplotlib.pyplot as plt
import numpy as np
# Generate sample data
np.random.seed(42)
data = np.random.normal(50, 15, 1000)
# Create subplots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
# Non-normalized histogram
ax1.hist(data, bins=20, alpha=0.7, color='lightcoral')
ax1.set_title('Non-Normalized Histogram')
ax1.set_xlabel('Values')
ax1.set_ylabel('Frequency')
# Normalized histogram
ax2.hist(data, bins=20, density=True, alpha=0.7, color='lightgreen')
ax2.set_title('Normalized Histogram')
ax2.set_xlabel('Values')
ax2.set_ylabel('Density')
plt.tight_layout()
plt.show()
Multiple Datasets Comparison
Normalization is particularly useful when comparing datasets of different sizes ?
import matplotlib.pyplot as plt
import numpy as np
# Two datasets of different sizes
np.random.seed(42)
small_dataset = np.random.normal(50, 10, 100)
large_dataset = np.random.normal(52, 12, 1000)
plt.figure(figsize=(10, 6))
# Plot both normalized histograms
plt.hist(small_dataset, bins=15, density=True, alpha=0.6,
label='Small Dataset (n=100)', color='blue')
plt.hist(large_dataset, bins=15, density=True, alpha=0.6,
label='Large Dataset (n=1000)', color='red')
plt.title('Comparison of Normalized Histograms')
plt.xlabel('Values')
plt.ylabel('Density')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
Key Parameters
| Parameter | Description | Effect |
|---|---|---|
density=True |
Normalizes the histogram | Area under curve = 1 |
bins |
Number of bins | Controls histogram resolution |
alpha |
Transparency (0-1) | Useful for overlapping plots |
Conclusion
Use density=True in plt.hist() to normalize histograms. This makes the total area equal 1, enabling probability comparisons and analysis across different datasets. Normalized histograms are essential for statistical analysis and data comparison.
