Boxplot stratified by column in Python Pandas

A boxplot stratified by column in Pandas allows you to create separate boxplots for different groups within your data. This is useful for comparing distributions across categorical variables.

Basic Boxplot by Column

Use the boxplot() method with the by parameter to group data ?

import pandas as pd
import matplotlib.pyplot as plt

# Create sample data
df = pd.DataFrame({
    'values': [23, 25, 28, 32, 35, 18, 22, 26, 30, 33],
    'category': ['A', 'B', 'A', 'B', 'A', 'A', 'B', 'A', 'B', 'A']
})

print("Sample Data:")
print(df)
Sample Data:
   values category
0      23        A
1      25        B
2      28        A
3      32        B
4      35        A
5      18        A
6      22        B
7      26        A
8      30        B
9      33        A

Creating Stratified Boxplot

The by parameter creates separate boxplots for each unique value in the specified column ?

import pandas as pd
import matplotlib.pyplot as plt

# Create sample data
df = pd.DataFrame({
    'values': [23, 25, 28, 32, 35, 18, 22, 26, 30, 33],
    'category': ['A', 'B', 'A', 'B', 'A', 'A', 'B', 'A', 'B', 'A']
})

# Create boxplot stratified by category
ax = df.boxplot(column='values', by='category', figsize=(8, 5))
plt.suptitle('Boxplot Stratified by Category')
plt.title('')  # Remove automatic title
plt.show()

Multiple Columns Stratification

You can stratify by multiple columns to create more detailed groupings ?

import pandas as pd
import matplotlib.pyplot as plt

# Create more complex data
df = pd.DataFrame({
    'score': [85, 90, 78, 88, 92, 76, 89, 91, 83, 87, 79, 94],
    'subject': ['Math', 'Science', 'Math', 'Science', 'Math', 'Math', 
               'Science', 'Science', 'Math', 'Science', 'Math', 'Science'],
    'grade': ['A', 'A', 'B', 'A', 'A', 'B', 'A', 'A', 'B', 'A', 'B', 'A']
})

# Boxplot by subject and grade
ax = df.boxplot(column='score', by=['subject', 'grade'], figsize=(10, 6))
plt.suptitle('Student Scores by Subject and Grade')
plt.show()

Customizing Stratified Boxplots

Add custom styling and labels for better visualization ?

import pandas as pd
import matplotlib.pyplot as plt

# Sample sales data
df = pd.DataFrame({
    'sales': [120, 150, 130, 160, 140, 110, 145, 155, 135, 165],
    'region': ['North', 'South', 'North', 'South', 'North', 
               'North', 'South', 'South', 'North', 'South']
})

# Create customized boxplot
fig, ax = plt.subplots(figsize=(8, 6))
df.boxplot(column='sales', by='region', ax=ax, 
           patch_artist=True, 
           boxprops=dict(facecolor='lightblue', alpha=0.7))

plt.suptitle('Sales Distribution by Region', fontsize=14, fontweight='bold')
plt.xlabel('Region', fontsize=12)
plt.ylabel('Sales ($000)', fontsize=12)
plt.title('')  # Remove automatic subplot title
plt.grid(True, alpha=0.3)
plt.show()

Key Parameters

Parameter Description Example
column Column to create boxplot for 'values'
by Column(s) to group by 'category' or ['col1', 'col2']
figsize Figure dimensions (8, 6)
patch_artist Enable custom styling True

Conclusion

Stratified boxplots in Pandas help visualize data distributions across different groups using the by parameter. This technique is essential for comparing statistical distributions and identifying patterns across categories in your dataset.

Updated on: 2026-03-26T19:04:27+05:30

546 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements