Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python Pandas - Draw swarms of observations on top of a violin plot with Seaborn
A violin plot shows the distribution of data across categories, while a swarm plot displays individual data points without overlap. Combining them creates a powerful visualization that shows both distribution shape and individual observations.
Creating Sample Data
Let's create sample cricket data to demonstrate this visualization ?
import seaborn as sb
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Create sample cricket data
np.random.seed(42)
data = {
'Role': ['Batsman'] * 20 + ['Bowler'] * 20 + ['All-rounder'] * 15,
'Matches': (
list(np.random.normal(45, 12, 20)) + # Batsmen
list(np.random.normal(38, 8, 20)) + # Bowlers
list(np.random.normal(42, 10, 15)) # All-rounders
)
}
# Convert to positive integers
data['Matches'] = [max(10, int(x)) for x in data['Matches']]
df = pd.DataFrame(data)
print(df.head(10))
Role Matches
0 Batsman 52
1 Batsman 39
2 Batsman 60
3 Batsman 52
4 Batsman 36
5 Batsman 57
6 Batsman 46
7 Batsman 42
8 Batsman 48
9 Batsman 41
Creating Violin Plot with Swarm Overlay
Combine violin and swarm plots to show distribution and individual points ?
import seaborn as sb
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Create sample data
np.random.seed(42)
data = {
'Role': ['Batsman'] * 20 + ['Bowler'] * 20 + ['All-rounder'] * 15,
'Matches': (
list(np.random.normal(45, 12, 20)) +
list(np.random.normal(38, 8, 20)) +
list(np.random.normal(42, 10, 15))
)
}
data['Matches'] = [max(10, int(x)) for x in data['Matches']]
df = pd.DataFrame(data)
# Set theme and create the plot
sb.set_theme(style="whitegrid")
plt.figure(figsize=(10, 6))
# Draw violin plot first (background)
sb.violinplot(x="Role", y="Matches", data=df, inner=None, alpha=0.7)
# Draw swarm plot on top
sb.swarmplot(x="Role", y="Matches", data=df, color="white", size=5, edgecolor="black", linewidth=0.5)
plt.title("Cricket Matches by Player Role")
plt.xlabel("Player Role")
plt.ylabel("Number of Matches")
plt.show()
Customizing the Visualization
Enhance the plot with colors and styling options ?
import seaborn as sb
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Create sample data
np.random.seed(42)
data = {
'Role': ['Batsman'] * 20 + ['Bowler'] * 20 + ['All-rounder'] * 15,
'Matches': (
list(np.random.normal(45, 12, 20)) +
list(np.random.normal(38, 8, 20)) +
list(np.random.normal(42, 10, 15))
)
}
data['Matches'] = [max(10, int(x)) for x in data['Matches']]
df = pd.DataFrame(data)
# Create enhanced visualization
plt.figure(figsize=(12, 7))
# Custom color palette
colors = ["lightblue", "lightgreen", "lightcoral"]
# Draw violin plot with custom colors
sb.violinplot(x="Role", y="Matches", data=df,
palette=colors, inner=None, alpha=0.6)
# Draw swarm plot with dark points
sb.swarmplot(x="Role", y="Matches", data=df,
color="black", size=4, alpha=0.8)
plt.title("Distribution of Matches Played by Cricket Player Roles",
fontsize=14, fontweight='bold')
plt.xlabel("Player Role", fontsize=12)
plt.ylabel("Number of Matches", fontsize=12)
plt.grid(True, alpha=0.3)
plt.show()
Key Features
| Component | Purpose | Parameters |
|---|---|---|
violinplot() |
Shows distribution shape | x, y, data, palette, inner |
swarmplot() |
Shows individual points | x, y, data, color, size |
| Overlay | Combines both visualizations | Plot violin first, then swarm |
Benefits of Combined Plot
This combination provides several advantages ?
- Distribution Shape: Violin plot shows data density and spread
- Individual Points: Swarm plot reveals actual data points
- Outlier Detection: Easy to spot unusual values
- Sample Size: Number of points indicates group size
Conclusion
Combining violin and swarm plots creates comprehensive visualizations that show both distribution patterns and individual observations. Use violinplot() for the background distribution and swarmplot() for precise point placement.
