Adding a scatter of points to a boxplot using Matplotlib

To add a scatter of points to a boxplot using Matplotlib, we can combine the boxplot() method with scatter() to overlay individual data points. This technique helps visualize both the distribution summary and actual data points.

Steps

  • Set the figure size and adjust the padding between and around the subplots.

  • Create a DataFrame using DataFrame class with sample data columns.

  • Generate boxplots from the DataFrame using boxplot() method.

  • Enumerate through DataFrame columns to get x and y coordinates for scatter points.

  • Add scatter points with slight horizontal jitter for better visibility.

  • Display the figure using show() method.

Example

Here's how to create a boxplot with overlaid scatter points ?

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

# Set figure parameters
plt.rcParams["figure.figsize"] = [7.50, 3.50]
plt.rcParams["figure.autolayout"] = True

# Create sample data
np.random.seed(42)  # For reproducible results
data = pd.DataFrame({
    "Box1": np.random.rand(10), 
    "Box2": np.random.rand(10)
})

# Create boxplot
data.boxplot()

# Add scatter points with jitter
for i, column in enumerate(data):
    y = data[column]
    # Add slight horizontal jitter to avoid overlapping points
    x = np.random.normal(i + 1, 0.04, len(y))
    plt.scatter(x, y, alpha=0.7, s=30)

plt.title("Boxplot with Scatter Points")
plt.show()

The output shows boxplots with individual data points scattered around each box ?

A plot displaying two boxplots (Box1 and Box2) with scattered data points overlaid on top, showing both the statistical summary and individual observations.

Customizing the Scatter Points

You can customize the appearance of scatter points using different colors and markers ?

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

# Create sample data
np.random.seed(42)
data = pd.DataFrame({
    "Category A": np.random.normal(5, 1.5, 15),
    "Category B": np.random.normal(7, 2, 15),
    "Category C": np.random.normal(6, 1, 15)
})

# Create boxplot
bp = data.boxplot(patch_artist=True)

# Colors for each category
colors = ['red', 'blue', 'green']

# Add colored scatter points
for i, column in enumerate(data):
    y = data[column]
    x = np.random.normal(i + 1, 0.08, len(y))
    plt.scatter(x, y, alpha=0.6, s=40, c=colors[i], label=f'Data {column}')

plt.title("Enhanced Boxplot with Colored Scatter Points")
plt.legend()
plt.show()

This creates a more visually appealing plot with colored scatter points for each category.

Key Benefits

  • Data Transparency: Shows actual data points behind the statistical summary

  • Outlier Identification: Makes individual outliers more visible

  • Sample Size Awareness: Reveals the number of observations in each category

  • Distribution Details: Shows clustering patterns within each box

Conclusion

Adding scatter points to boxplots provides a complete view of your data by combining statistical summaries with individual observations. Use np.random.normal() to add horizontal jitter and prevent overlapping points for better visualization.

---
Updated on: 2026-03-25T21:09:40+05:30

5K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements