Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Boxplot stratified by column in Python Pandas
A boxplot stratified by column in Pandas allows you to create separate boxplots for different groups within your data. This is useful for comparing distributions across categorical variables.
Basic Boxplot by Column
Use the boxplot() method with the by parameter to group data ?
import pandas as pd
import matplotlib.pyplot as plt
# Create sample data
df = pd.DataFrame({
'values': [23, 25, 28, 32, 35, 18, 22, 26, 30, 33],
'category': ['A', 'B', 'A', 'B', 'A', 'A', 'B', 'A', 'B', 'A']
})
print("Sample Data:")
print(df)
Sample Data: values category 0 23 A 1 25 B 2 28 A 3 32 B 4 35 A 5 18 A 6 22 B 7 26 A 8 30 B 9 33 A
Creating Stratified Boxplot
The by parameter creates separate boxplots for each unique value in the specified column ?
import pandas as pd
import matplotlib.pyplot as plt
# Create sample data
df = pd.DataFrame({
'values': [23, 25, 28, 32, 35, 18, 22, 26, 30, 33],
'category': ['A', 'B', 'A', 'B', 'A', 'A', 'B', 'A', 'B', 'A']
})
# Create boxplot stratified by category
ax = df.boxplot(column='values', by='category', figsize=(8, 5))
plt.suptitle('Boxplot Stratified by Category')
plt.title('') # Remove automatic title
plt.show()
Multiple Columns Stratification
You can stratify by multiple columns to create more detailed groupings ?
import pandas as pd
import matplotlib.pyplot as plt
# Create more complex data
df = pd.DataFrame({
'score': [85, 90, 78, 88, 92, 76, 89, 91, 83, 87, 79, 94],
'subject': ['Math', 'Science', 'Math', 'Science', 'Math', 'Math',
'Science', 'Science', 'Math', 'Science', 'Math', 'Science'],
'grade': ['A', 'A', 'B', 'A', 'A', 'B', 'A', 'A', 'B', 'A', 'B', 'A']
})
# Boxplot by subject and grade
ax = df.boxplot(column='score', by=['subject', 'grade'], figsize=(10, 6))
plt.suptitle('Student Scores by Subject and Grade')
plt.show()
Customizing Stratified Boxplots
Add custom styling and labels for better visualization ?
import pandas as pd
import matplotlib.pyplot as plt
# Sample sales data
df = pd.DataFrame({
'sales': [120, 150, 130, 160, 140, 110, 145, 155, 135, 165],
'region': ['North', 'South', 'North', 'South', 'North',
'North', 'South', 'South', 'North', 'South']
})
# Create customized boxplot
fig, ax = plt.subplots(figsize=(8, 6))
df.boxplot(column='sales', by='region', ax=ax,
patch_artist=True,
boxprops=dict(facecolor='lightblue', alpha=0.7))
plt.suptitle('Sales Distribution by Region', fontsize=14, fontweight='bold')
plt.xlabel('Region', fontsize=12)
plt.ylabel('Sales ($000)', fontsize=12)
plt.title('') # Remove automatic subplot title
plt.grid(True, alpha=0.3)
plt.show()
Key Parameters
| Parameter | Description | Example |
|---|---|---|
column |
Column to create boxplot for | 'values' |
by |
Column(s) to group by |
'category' or ['col1', 'col2']
|
figsize |
Figure dimensions | (8, 6) |
patch_artist |
Enable custom styling | True |
Conclusion
Stratified boxplots in Pandas help visualize data distributions across different groups using the by parameter. This technique is essential for comparing statistical distributions and identifying patterns across categories in your dataset.
