Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Adding a scatter of points to a boxplot using Matplotlib
To add a scatter of points to a boxplot using Matplotlib, we can combine the boxplot() method with scatter() to overlay individual data points. This technique helps visualize both the distribution summary and actual data points.
Steps
Set the figure size and adjust the padding between and around the subplots.
Create a DataFrame using
DataFrameclass with sample data columns.Generate boxplots from the DataFrame using
boxplot()method.Enumerate through DataFrame columns to get x and y coordinates for scatter points.
Add scatter points with slight horizontal jitter for better visibility.
Display the figure using
show()method.
Example
Here's how to create a boxplot with overlaid scatter points ?
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
# Set figure parameters
plt.rcParams["figure.figsize"] = [7.50, 3.50]
plt.rcParams["figure.autolayout"] = True
# Create sample data
np.random.seed(42) # For reproducible results
data = pd.DataFrame({
"Box1": np.random.rand(10),
"Box2": np.random.rand(10)
})
# Create boxplot
data.boxplot()
# Add scatter points with jitter
for i, column in enumerate(data):
y = data[column]
# Add slight horizontal jitter to avoid overlapping points
x = np.random.normal(i + 1, 0.04, len(y))
plt.scatter(x, y, alpha=0.7, s=30)
plt.title("Boxplot with Scatter Points")
plt.show()
The output shows boxplots with individual data points scattered around each box ?
A plot displaying two boxplots (Box1 and Box2) with scattered data points overlaid on top, showing both the statistical summary and individual observations.
Customizing the Scatter Points
You can customize the appearance of scatter points using different colors and markers ?
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
# Create sample data
np.random.seed(42)
data = pd.DataFrame({
"Category A": np.random.normal(5, 1.5, 15),
"Category B": np.random.normal(7, 2, 15),
"Category C": np.random.normal(6, 1, 15)
})
# Create boxplot
bp = data.boxplot(patch_artist=True)
# Colors for each category
colors = ['red', 'blue', 'green']
# Add colored scatter points
for i, column in enumerate(data):
y = data[column]
x = np.random.normal(i + 1, 0.08, len(y))
plt.scatter(x, y, alpha=0.6, s=40, c=colors[i], label=f'Data {column}')
plt.title("Enhanced Boxplot with Colored Scatter Points")
plt.legend()
plt.show()
This creates a more visually appealing plot with colored scatter points for each category.
Key Benefits
Data Transparency: Shows actual data points behind the statistical summary
Outlier Identification: Makes individual outliers more visible
Sample Size Awareness: Reveals the number of observations in each category
Distribution Details: Shows clustering patterns within each box
Conclusion
Adding scatter points to boxplots provides a complete view of your data by combining statistical summaries with individual observations. Use np.random.normal() to add horizontal jitter and prevent overlapping points for better visualization.
