Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to Make Stripplot with Jitter in Altair Python?
A stripplot with jitter is an effective way to visualize the distribution of a continuous variable across different categories. In Altair Python, we use mark_circle() to create the plot and transform_calculate() to add jitter, which spreads overlapping points horizontally for better visibility.
Syntax
The basic syntax for creating a stripplot with jitter in Altair involves creating a chart with circular markers and adding calculated jitter ?
import altair as alt
# Basic stripplot with jitter syntax
alt.Chart(data).mark_circle(size=50).encode(
x=alt.X('jitter:Q', title=None,
axis=alt.Axis(ticks=False, grid=False, labels=False)),
y=alt.Y('continuous_variable:Q'),
color=alt.Color('category:N')
).transform_calculate(
jitter='sqrt(-2*log(random()))*cos(2*PI*random())'
)
Creating a Stripplot with Custom Data
Let's create a stripplot using randomly generated data to demonstrate the jitter effect ?
import altair as alt
import pandas as pd
import numpy as np
# Enable Altair to render in notebooks
alt.data_transformers.enable('json')
# Create sample data
np.random.seed(42)
data = pd.DataFrame({
'values': np.concatenate([
np.random.normal(10, 2, 50), # Group A
np.random.normal(15, 3, 50), # Group B
np.random.normal(12, 1.5, 50) # Group C
]),
'category': ['A'] * 50 + ['B'] * 50 + ['C'] * 50
})
# Create stripplot with jitter
chart = alt.Chart(data).mark_circle(size=50, opacity=0.7).encode(
x=alt.X('jitter:Q',
title=None,
axis=alt.Axis(ticks=False, grid=False, labels=False),
scale=alt.Scale(range=[0, 100])),
y=alt.Y('values:Q', title='Values'),
color=alt.Color('category:N',
scale=alt.Scale(range=['#1f77b4', '#ff7f0e', '#2ca02c']))
).transform_calculate(
jitter='sqrt(-2*log(random()))*cos(2*PI*random())'
).properties(
width=200,
height=300,
title='Stripplot with Jitter'
)
print("Stripplot created successfully!")
print(f"Data shape: {data.shape}")
print(f"Categories: {data['category'].unique()}")
Stripplot created successfully! Data shape: (150, 2) Categories: ['A' 'B' 'C']
Using Real Dataset Iris Example
Here's how to create a stripplot with the famous Iris dataset ?
import altair as alt
from vega_datasets import data
# Load the Iris dataset
iris = data.iris()
# Create stripplot with jitter for petal width by species
stripplot = alt.Chart(iris).mark_circle(size=60, opacity=0.8).encode(
x=alt.X('jitter:Q',
title=None,
axis=alt.Axis(ticks=False, grid=False, labels=False),
scale=alt.Scale(range=[0, 80])),
y=alt.Y('petalWidth:Q',
title='Petal Width (cm)',
scale=alt.Scale(zero=False)),
color=alt.Color('species:N',
title='Species',
scale=alt.Scale(range=['#e41a1c', '#377eb8', '#4daf4a']))
).transform_calculate(
jitter='sqrt(-2*log(random()))*cos(2*PI*random())'
).properties(
width=250,
height=350,
title='Iris Petal Width Distribution by Species'
)
print("Iris stripplot created!")
print(f"Dataset contains {len(iris)} samples")
print(f"Species: {iris['species'].unique()}")
Iris stripplot created! Dataset contains 150 samples Species: ['setosa' 'versicolor' 'virginica']
Understanding the Jitter Formula
The jitter calculation uses the Box-Muller transformation to generate normally distributed random values ?
import numpy as np
import matplotlib.pyplot as plt
# Demonstrate the jitter formula
np.random.seed(42)
n_points = 1000
# Box-Muller transformation for normal distribution
u1 = np.random.random(n_points)
u2 = np.random.random(n_points)
jitter_values = np.sqrt(-2 * np.log(u1)) * np.cos(2 * np.pi * u2)
print(f"Jitter statistics:")
print(f"Mean: {np.mean(jitter_values):.3f}")
print(f"Standard deviation: {np.std(jitter_values):.3f}")
print(f"Min value: {np.min(jitter_values):.3f}")
print(f"Max value: {np.max(jitter_values):.3f}")
Jitter statistics: Mean: 0.016 Standard deviation: 1.006 Min value: -3.040 Max value: 3.722
Comparison: With vs Without Jitter
| Aspect | Without Jitter | With Jitter |
|---|---|---|
| Point Overlap | High overlap, hard to see individual points | Points spread out, all visible |
| Distribution Clarity | Difficult to assess density | Clear density visualization |
| Best Use Case | Small datasets with no overlap | Any dataset size, especially with overlapping values |
Key Parameters
- size: Controls circle marker size (default: 30)
- opacity: Sets transparency (0-1 range)
- scale range: Controls jitter spread width
- jitter formula: Generates normally distributed random offsets
Conclusion
Stripplots with jitter in Altair effectively reveal data distribution patterns that would be hidden by overlapping points. The transform_calculate() method with Box-Muller transformation provides smooth, normally distributed jitter for professional-looking visualizations.
