Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python Pandas - Draw a boxplot and display the datapoints on top of boxes by plotting Swarm plot with Seaborn
A box plot shows the distribution of data through quartiles, while a swarm plot displays individual data points without overlap. Combining both creates a comprehensive visualization that shows both statistical summaries and actual data points.
Required Libraries
First, import the necessary libraries for data manipulation and visualization:
import seaborn as sns import pandas as pd import matplotlib.pyplot as plt import numpy as np
Creating Sample Data
Let's create sample cricket player data to demonstrate the visualization:
# Create sample cricket data
np.random.seed(42)
roles = ['Batsman'] * 15 + ['Bowler'] * 12 + ['All-rounder'] * 10
ages_batsman = np.random.normal(28, 4, 15).astype(int)
ages_bowler = np.random.normal(26, 3, 12).astype(int)
ages_allrounder = np.random.normal(30, 5, 10).astype(int)
ages = np.concatenate([ages_batsman, ages_bowler, ages_allrounder])
# Create DataFrame
data = pd.DataFrame({
'Role': roles,
'Age': ages
})
print(data.head(10))
Role Age
0 Batsman 33
1 Batsman 27
2 Batsman 32
3 Batsman 36
4 Batsman 29
5 Batsman 27
6 Batsman 26
7 Batsman 30
8 Batsman 25
9 Batsman 26
Creating Box Plot with Swarm Plot Overlay
Plot the box plot first, then overlay the swarm plot with the same x and y parameters:
# Set up the plot
plt.figure(figsize=(10, 6))
# Create box plot
sns.boxplot(x='Role', y='Age', data=data, palette='Set2')
# Overlay swarm plot on top of box plot
sns.swarmplot(x='Role', y='Age', data=data, color='black', alpha=0.7, size=4)
# Customize the plot
plt.title('Age Distribution by Cricket Player Role', fontsize=16, fontweight='bold')
plt.xlabel('Player Role', fontsize=12)
plt.ylabel('Age (years)', fontsize=12)
plt.grid(True, alpha=0.3)
# Display the plot
plt.tight_layout()
plt.show()
Enhanced Visualization with Colors
Create a more visually appealing version with different colors for each role:
# Create enhanced visualization
plt.figure(figsize=(12, 7))
# Create box plot with palette
sns.boxplot(x='Role', y='Age', data=data, palette='viridis', alpha=0.7)
# Overlay swarm plot with matching colors
sns.swarmplot(x='Role', y='Age', data=data, palette='viridis', size=5, alpha=0.8)
# Customize appearance
plt.title('Cricket Player Age Distribution by Role\n(Box Plot with Individual Data Points)',
fontsize=16, fontweight='bold', pad=20)
plt.xlabel('Player Role', fontsize=14, fontweight='bold')
plt.ylabel('Age (years)', fontsize=14, fontweight='bold')
# Add grid for better readability
plt.grid(True, alpha=0.3, linestyle='--')
# Improve layout
plt.tight_layout()
plt.show()
Key Benefits of Combined Visualization
| Visualization | Information Provided | Best For |
|---|---|---|
| Box Plot | Quartiles, median, outliers | Statistical summary |
| Swarm Plot | Individual data points | Data distribution pattern |
| Combined | Both statistical summary and raw data | Comprehensive analysis |
Customization Options
You can customize various aspects of the combined plot:
# Customized version with different styling
plt.figure(figsize=(10, 6))
# Box plot with custom styling
box_plot = sns.boxplot(x='Role', y='Age', data=data,
palette='pastel',
boxprops=dict(alpha=0.7),
whiskerprops=dict(color='gray'),
capprops=dict(color='gray'),
medianprops=dict(color='red', linewidth=2))
# Swarm plot with custom styling
swarm_plot = sns.swarmplot(x='Role', y='Age', data=data,
color='darkblue',
size=3,
alpha=0.6)
plt.title('Customized Box Plot with Swarm Plot Overlay')
plt.show()
Conclusion
Combining box plots with swarm plots provides both statistical summaries and individual data point visualization. Use sns.boxplot() first, then sns.swarmplot() with identical parameters to create effective overlaid visualizations for categorical data analysis.
