Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python Pandas - Group the swarms by two categorical variables with Seaborn
Swarm Plot in Seaborn is used to draw a categorical scatterplot with non-overlapping points. The seaborn.swarmplot() function is used for this. To group the swarms by two categorical variables, set those variables in the swarmplot() using the x, y or hue parameters.
Sample Dataset
We'll create a sample cricket dataset to demonstrate grouping by two categorical variables ?
import seaborn as sb
import pandas as pd
import matplotlib.pyplot as plt
# Create sample cricket data
data = {
'Role': ['Batsman', 'Batsman', 'Bowler', 'Bowler', 'All-rounder', 'All-rounder',
'Batsman', 'Bowler', 'All-rounder', 'Batsman', 'Bowler', 'All-rounder'],
'Matches': [45, 32, 28, 41, 38, 29, 52, 35, 44, 39, 31, 47],
'Academy': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B']
}
dataFrame = pd.DataFrame(data)
print(dataFrame)
Role Matches Academy
0 Batsman 45 A
1 Batsman 32 B
2 Bowler 28 A
3 Bowler 41 B
4 All-rounder 38 A
5 All-rounder 29 B
6 Batsman 52 A
7 Bowler 35 B
8 All-rounder 44 A
9 Batsman 39 B
10 Bowler 31 A
11 All-rounder 47 B
Basic Swarm Plot with Two Categorical Variables
Use the x, y, and hue parameters to group by two categorical variables ?
import seaborn as sb
import pandas as pd
import matplotlib.pyplot as plt
# Create sample data
data = {
'Role': ['Batsman', 'Batsman', 'Bowler', 'Bowler', 'All-rounder', 'All-rounder',
'Batsman', 'Bowler', 'All-rounder', 'Batsman', 'Bowler', 'All-rounder'],
'Matches': [45, 32, 28, 41, 38, 29, 52, 35, 44, 39, 31, 47],
'Academy': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B']
}
dataFrame = pd.DataFrame(data)
# Set the theme
sb.set_theme(style="whitegrid")
# Create swarm plot grouped by Role (x-axis) and Academy (hue)
plt.figure(figsize=(8, 6))
sb.swarmplot(x="Role", y="Matches", hue="Academy", data=dataFrame)
plt.title("Cricket Matches by Player Role and Academy")
plt.show()
Customizing the Swarm Plot
You can customize colors, size, and other properties of the swarm plot ?
import seaborn as sb
import pandas as pd
import matplotlib.pyplot as plt
# Create sample data
data = {
'Role': ['Batsman', 'Batsman', 'Bowler', 'Bowler', 'All-rounder', 'All-rounder',
'Batsman', 'Bowler', 'All-rounder', 'Batsman', 'Bowler', 'All-rounder'],
'Matches': [45, 32, 28, 41, 38, 29, 52, 35, 44, 39, 31, 47],
'Academy': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B']
}
dataFrame = pd.DataFrame(data)
# Create customized swarm plot
plt.figure(figsize=(10, 6))
sb.swarmplot(x="Role", y="Matches", hue="Academy", data=dataFrame,
palette="Set2", size=8, alpha=0.8)
plt.title("Player Performance by Role and Academy")
plt.xlabel("Player Role")
plt.ylabel("Number of Matches")
plt.legend(title="Academy")
plt.show()
Alternative Grouping Approach
You can also swap the categorical variables to see different perspectives ?
import seaborn as sb
import pandas as pd
import matplotlib.pyplot as plt
# Create sample data
data = {
'Role': ['Batsman', 'Batsman', 'Bowler', 'Bowler', 'All-rounder', 'All-rounder',
'Batsman', 'Bowler', 'All-rounder', 'Batsman', 'Bowler', 'All-rounder'],
'Matches': [45, 32, 28, 41, 38, 29, 52, 35, 44, 39, 31, 47],
'Academy': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B']
}
dataFrame = pd.DataFrame(data)
# Group by Academy (x-axis) and Role (hue)
plt.figure(figsize=(8, 6))
sb.swarmplot(x="Academy", y="Matches", hue="Role", data=dataFrame)
plt.title("Matches by Academy and Player Role")
plt.show()
Key Parameters
| Parameter | Description | Example |
|---|---|---|
x |
First categorical variable | "Role" |
hue |
Second categorical variable | "Academy" |
palette |
Color scheme | "Set1", "viridis" |
size |
Point size | 5, 8, 10 |
Conclusion
Seaborn's swarmplot() effectively groups data by two categorical variables using x and hue parameters. This creates clear visualizations showing distributions across multiple categories without overlapping points.
