How to Make Stripplot with Jitter in Altair Python?

A stripplot with jitter is an effective way to visualize the distribution of a continuous variable across different categories. In Altair Python, we use mark_circle() to create the plot and transform_calculate() to add jitter, which spreads overlapping points horizontally for better visibility.

Syntax

The basic syntax for creating a stripplot with jitter in Altair involves creating a chart with circular markers and adding calculated jitter ?

import altair as alt

# Basic stripplot with jitter syntax
alt.Chart(data).mark_circle(size=50).encode(
    x=alt.X('jitter:Q', title=None, 
            axis=alt.Axis(ticks=False, grid=False, labels=False)),
    y=alt.Y('continuous_variable:Q'),
    color=alt.Color('category:N')
).transform_calculate(
    jitter='sqrt(-2*log(random()))*cos(2*PI*random())'
)

Creating a Stripplot with Custom Data

Let's create a stripplot using randomly generated data to demonstrate the jitter effect ?

import altair as alt
import pandas as pd
import numpy as np

# Enable Altair to render in notebooks
alt.data_transformers.enable('json')

# Create sample data
np.random.seed(42)
data = pd.DataFrame({
    'values': np.concatenate([
        np.random.normal(10, 2, 50),  # Group A
        np.random.normal(15, 3, 50),  # Group B  
        np.random.normal(12, 1.5, 50) # Group C
    ]),
    'category': ['A'] * 50 + ['B'] * 50 + ['C'] * 50
})

# Create stripplot with jitter
chart = alt.Chart(data).mark_circle(size=50, opacity=0.7).encode(
    x=alt.X('jitter:Q', 
            title=None,
            axis=alt.Axis(ticks=False, grid=False, labels=False),
            scale=alt.Scale(range=[0, 100])),
    y=alt.Y('values:Q', title='Values'),
    color=alt.Color('category:N', 
                   scale=alt.Scale(range=['#1f77b4', '#ff7f0e', '#2ca02c']))
).transform_calculate(
    jitter='sqrt(-2*log(random()))*cos(2*PI*random())'
).properties(
    width=200,
    height=300,
    title='Stripplot with Jitter'
)

print("Stripplot created successfully!")
print(f"Data shape: {data.shape}")
print(f"Categories: {data['category'].unique()}")
Stripplot created successfully!
Data shape: (150, 2)
Categories: ['A' 'B' 'C']

Using Real Dataset Iris Example

Here's how to create a stripplot with the famous Iris dataset ?

import altair as alt
from vega_datasets import data

# Load the Iris dataset
iris = data.iris()

# Create stripplot with jitter for petal width by species
stripplot = alt.Chart(iris).mark_circle(size=60, opacity=0.8).encode(
    x=alt.X('jitter:Q', 
            title=None,
            axis=alt.Axis(ticks=False, grid=False, labels=False),
            scale=alt.Scale(range=[0, 80])),
    y=alt.Y('petalWidth:Q', 
            title='Petal Width (cm)',
            scale=alt.Scale(zero=False)),
    color=alt.Color('species:N', 
                   title='Species',
                   scale=alt.Scale(range=['#e41a1c', '#377eb8', '#4daf4a']))
).transform_calculate(
    jitter='sqrt(-2*log(random()))*cos(2*PI*random())'
).properties(
    width=250,
    height=350,
    title='Iris Petal Width Distribution by Species'
)

print("Iris stripplot created!")
print(f"Dataset contains {len(iris)} samples")
print(f"Species: {iris['species'].unique()}")
Iris stripplot created!
Dataset contains 150 samples
Species: ['setosa' 'versicolor' 'virginica']

Understanding the Jitter Formula

The jitter calculation uses the Box-Muller transformation to generate normally distributed random values ?

import numpy as np
import matplotlib.pyplot as plt

# Demonstrate the jitter formula
np.random.seed(42)
n_points = 1000

# Box-Muller transformation for normal distribution
u1 = np.random.random(n_points)
u2 = np.random.random(n_points)
jitter_values = np.sqrt(-2 * np.log(u1)) * np.cos(2 * np.pi * u2)

print(f"Jitter statistics:")
print(f"Mean: {np.mean(jitter_values):.3f}")
print(f"Standard deviation: {np.std(jitter_values):.3f}")
print(f"Min value: {np.min(jitter_values):.3f}")
print(f"Max value: {np.max(jitter_values):.3f}")
Jitter statistics:
Mean: 0.016
Standard deviation: 1.006
Min value: -3.040
Max value: 3.722

Comparison: With vs Without Jitter

Aspect Without Jitter With Jitter
Point Overlap High overlap, hard to see individual points Points spread out, all visible
Distribution Clarity Difficult to assess density Clear density visualization
Best Use Case Small datasets with no overlap Any dataset size, especially with overlapping values

Key Parameters

  • size: Controls circle marker size (default: 30)
  • opacity: Sets transparency (0-1 range)
  • scale range: Controls jitter spread width
  • jitter formula: Generates normally distributed random offsets

Conclusion

Stripplots with jitter in Altair effectively reveal data distribution patterns that would be hidden by overlapping points. The transform_calculate() method with Box-Muller transformation provides smooth, normally distributed jitter for professional-looking visualizations.

Updated on: 2026-03-27T06:34:35+05:30

470 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements