Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
Plotting histograms against classes in Pandas / Matplotlib
To plot histograms against classes in Pandas/Matplotlib, we can use the hist() method to visualize the distribution of values across different columns (classes) in a DataFrame. This is useful for comparing data distributions side by side.
Basic Histogram Plotting
Here's how to create histograms for multiple columns in a DataFrame ?
import matplotlib.pyplot as plt
import pandas as pd
# Set figure size for better visualization
plt.rcParams["figure.figsize"] = [10, 6]
plt.rcParams["figure.autolayout"] = True
# Create a sample DataFrame with different classes
df = pd.DataFrame({
'Class_A': [1, 2, 2, 3, 4, 2, 3, 1, 4, 2],
'Class_B': [2, 3, 1, 4, 2, 3, 1, 4, 2, 3],
'Class_C': [1, 1, 3, 3, 4, 4, 2, 2, 3, 1],
'Class_D': [3, 2, 4, 1, 3, 2, 4, 1, 2, 3]
})
# Plot histograms for all columns
df.hist(bins=4, alpha=0.7)
plt.suptitle('Histograms for Different Classes')
plt.show()
Customizing Histogram Appearance
You can customize the histogram appearance with different parameters ?
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# Create sample data with more variation
np.random.seed(42)
df = pd.DataFrame({
'Score_Math': np.random.normal(75, 15, 100),
'Score_Science': np.random.normal(80, 12, 100),
'Score_English': np.random.normal(70, 18, 100)
})
# Plot customized histograms
df.hist(bins=15, figsize=(12, 8), color=['skyblue', 'lightgreen', 'lightcoral'])
plt.suptitle('Student Scores Distribution by Subject', fontsize=16)
plt.tight_layout()
plt.show()
Plotting Histograms by Categorical Classes
When you have categorical data, you can group by classes and plot histograms ?
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# Create DataFrame with categorical classes
np.random.seed(123)
data = {
'values': np.concatenate([
np.random.normal(50, 10, 50), # Group A
np.random.normal(70, 15, 50), # Group B
np.random.normal(60, 12, 50) # Group C
]),
'category': ['A']*50 + ['B']*50 + ['C']*50
}
df = pd.DataFrame(data)
# Plot histogram for each category
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
for i, category in enumerate(['A', 'B', 'C']):
subset = df[df['category'] == category]
axes[i].hist(subset['values'], bins=10, alpha=0.7,
color=['red', 'green', 'blue'][i])
axes[i].set_title(f'Category {category}')
axes[i].set_xlabel('Values')
axes[i].set_ylabel('Frequency')
plt.tight_layout()
plt.show()
Key Parameters
| Parameter | Description | Example |
|---|---|---|
bins |
Number of histogram bins | bins=10 |
alpha |
Transparency level (0-1) | alpha=0.7 |
figsize |
Figure dimensions (width, height) | figsize=(10, 6) |
color |
Colors for each histogram | color=['red', 'blue'] |
Conclusion
Use df.hist() to quickly create histograms for all DataFrame columns. For categorical data, group by classes first and plot separate histograms. Customize with parameters like bins, alpha, and color for better visualization.
Advertisements
