Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to sort a boxplot by the median values in Pandas?
To sort a boxplot by the median values in Pandas, you need to calculate the median of each group, sort them, and reorder the data accordingly. This technique is useful when you want to display boxplots in a meaningful order based on their central tendency.
Steps
Create a DataFrame with categorical data
Group the data by the categorical variable
Calculate the median for each group
Sort the medians in desired order
Reorder the DataFrame columns based on sorted medians
Create the boxplot with sorted data
Example
Here's how to create a boxplot sorted by median values ?
import pandas as pd
import matplotlib.pyplot as plt
# Set figure size
plt.rcParams["figure.figsize"] = [10, 6]
plt.rcParams["figure.autolayout"] = True
# Create sample data
data = {
'Group_A': [10, 15, 12, 18, 20, 14, 16],
'Group_B': [25, 30, 28, 32, 35, 29, 31],
'Group_C': [5, 8, 6, 9, 11, 7, 10],
'Group_D': [40, 45, 42, 48, 50, 44, 46]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df.head())
Original DataFrame: Group_A Group_B Group_C Group_D 0 10 25 5 40 1 15 30 8 45 2 12 28 6 42 3 18 32 9 48 4 20 35 11 50
Sorting by Median Values
Calculate medians and sort the DataFrame columns accordingly ?
import pandas as pd
import matplotlib.pyplot as plt
# Create sample data
data = {
'Group_A': [10, 15, 12, 18, 20, 14, 16],
'Group_B': [25, 30, 28, 32, 35, 29, 31],
'Group_C': [5, 8, 6, 9, 11, 7, 10],
'Group_D': [40, 45, 42, 48, 50, 44, 46]
}
df = pd.DataFrame(data)
# Calculate median for each group
medians = df.median()
print("Medians for each group:")
print(medians)
# Sort medians in ascending order
sorted_medians = medians.sort_values()
print("\nSorted medians:")
print(sorted_medians)
# Reorder DataFrame columns based on sorted medians
df_sorted = df[sorted_medians.index]
# Create boxplot
df_sorted.boxplot(figsize=(10, 6))
plt.title('Boxplot Sorted by Median Values (Ascending)')
plt.ylabel('Values')
plt.show()
Medians for each group: Group_A 15.0 Group_B 30.0 Group_C 8.0 Group_D 45.0 dtype: float64 Sorted medians: Group_C 8.0 Group_A 15.0 Group_B 30.0 Group_D 45.0 dtype: float64
Sorting in Descending Order
You can also sort the boxplots in descending order of median values ?
import pandas as pd
import matplotlib.pyplot as plt
# Create sample data
data = {
'Group_A': [10, 15, 12, 18, 20, 14, 16],
'Group_B': [25, 30, 28, 32, 35, 29, 31],
'Group_C': [5, 8, 6, 9, 11, 7, 10],
'Group_D': [40, 45, 42, 48, 50, 44, 46]
}
df = pd.DataFrame(data)
# Calculate and sort medians in descending order
medians = df.median().sort_values(ascending=False)
print("Medians sorted in descending order:")
print(medians)
# Reorder DataFrame and create boxplot
df_desc_sorted = df[medians.index]
df_desc_sorted.boxplot(figsize=(10, 6))
plt.title('Boxplot Sorted by Median Values (Descending)')
plt.ylabel('Values')
plt.show()
Medians sorted in descending order: Group_D 45.0 Group_B 30.0 Group_A 15.0 Group_C 8.0 dtype: float64
Key Points
Use
df.median()to calculate median values for each columnUse
sort_values()to order medians in ascending or descending orderReorder DataFrame columns using
df[sorted_index]The boxplot will display groups from left to right based on sorted median values
Conclusion
Sorting boxplots by median values helps in better data visualization and comparison. Calculate medians using df.median(), sort them with sort_values(), and reorder your DataFrame columns accordingly before plotting.
