Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to make two histograms have the same bin width in Matplotlib?
When comparing data distributions using histograms in Matplotlib, it's essential to use the same bin width for accurate comparison. This ensures both histograms use identical bin boundaries, making visual comparison meaningful.
Why Same Bin Width Matters
Different bin widths can lead to misleading comparisons between datasets. Using consistent bins ensures that both histograms partition the data identically, allowing for proper statistical comparison.
Method: Using np.histogram() to Define Common Bins
The most effective approach is to compute bins based on the combined range of both datasets using np.histogram() ?
import numpy as np
import matplotlib.pyplot as plt
# Set figure parameters for better visualization
plt.rcParams["figure.figsize"] = [10, 6]
plt.rcParams["figure.autolayout"] = True
# Create two different datasets
dataset_a = np.random.random(100) * 0.5
dataset_b = 1 - np.random.normal(size=100) * 0.1
# Define number of bins
num_bins = 15
# Calculate common bins based on combined data range
combined_data = np.hstack((dataset_a, dataset_b))
bins = np.histogram(combined_data, bins=num_bins)[1]
# Create histograms with same bins
plt.figure(figsize=(10, 6))
plt.hist(dataset_a, bins=bins, alpha=0.7, label='Dataset A', color='blue', edgecolor='black')
plt.hist(dataset_b, bins=bins, alpha=0.7, label='Dataset B', color='red', edgecolor='black')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Two Histograms with Same Bin Width')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
This creates two overlapping histograms with identical bin boundaries, making comparison straightforward.
Alternative Method: Manual Bin Range
You can also manually define the bin range when you know the data boundaries ?
import numpy as np
import matplotlib.pyplot as plt
# Generate sample data
data1 = np.random.normal(50, 10, 1000)
data2 = np.random.normal(55, 8, 1000)
# Define manual bin edges
min_val = min(data1.min(), data2.min())
max_val = max(data1.max(), data2.max())
bins = np.linspace(min_val, max_val, 20)
# Create side-by-side subplots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
ax1.hist(data1, bins=bins, alpha=0.7, color='green', edgecolor='black')
ax1.set_title('Dataset 1')
ax1.set_xlabel('Value')
ax1.set_ylabel('Frequency')
ax2.hist(data2, bins=bins, alpha=0.7, color='orange', edgecolor='black')
ax2.set_title('Dataset 2')
ax2.set_xlabel('Value')
plt.tight_layout()
plt.show()
This approach gives you precise control over the bin boundaries and is useful when comparing datasets in separate subplots.
Key Parameters
| Parameter | Description | Example |
|---|---|---|
bins |
Array of bin edges | np.linspace(0, 1, 10) |
alpha |
Transparency (0-1) | 0.7 |
edgecolor |
Bin border color | 'black' |
label |
Legend label | 'Dataset A' |
Conclusion
Use np.histogram() with combined data to generate common bins for fair histogram comparison. This ensures both datasets are partitioned identically, making statistical comparison meaningful and visually accurate.
