How to sort a boxplot by the median values in Pandas?

To sort a boxplot by the median values in Pandas, you need to calculate the median of each group, sort them, and reorder the data accordingly. This technique is useful when you want to display boxplots in a meaningful order based on their central tendency.

Steps

  • Create a DataFrame with categorical data

  • Group the data by the categorical variable

  • Calculate the median for each group

  • Sort the medians in desired order

  • Reorder the DataFrame columns based on sorted medians

  • Create the boxplot with sorted data

Example

Here's how to create a boxplot sorted by median values ?

import pandas as pd
import matplotlib.pyplot as plt

# Set figure size
plt.rcParams["figure.figsize"] = [10, 6]
plt.rcParams["figure.autolayout"] = True

# Create sample data
data = {
    'Group_A': [10, 15, 12, 18, 20, 14, 16],
    'Group_B': [25, 30, 28, 32, 35, 29, 31],
    'Group_C': [5, 8, 6, 9, 11, 7, 10],
    'Group_D': [40, 45, 42, 48, 50, 44, 46]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df.head())
Original DataFrame:
   Group_A  Group_B  Group_C  Group_D
0       10       25        5       40
1       15       30        8       45
2       12       28        6       42
3       18       32        9       48
4       20       35       11       50

Sorting by Median Values

Calculate medians and sort the DataFrame columns accordingly ?

import pandas as pd
import matplotlib.pyplot as plt

# Create sample data
data = {
    'Group_A': [10, 15, 12, 18, 20, 14, 16],
    'Group_B': [25, 30, 28, 32, 35, 29, 31],
    'Group_C': [5, 8, 6, 9, 11, 7, 10],
    'Group_D': [40, 45, 42, 48, 50, 44, 46]
}

df = pd.DataFrame(data)

# Calculate median for each group
medians = df.median()
print("Medians for each group:")
print(medians)

# Sort medians in ascending order
sorted_medians = medians.sort_values()
print("\nSorted medians:")
print(sorted_medians)

# Reorder DataFrame columns based on sorted medians
df_sorted = df[sorted_medians.index]

# Create boxplot
df_sorted.boxplot(figsize=(10, 6))
plt.title('Boxplot Sorted by Median Values (Ascending)')
plt.ylabel('Values')
plt.show()
Medians for each group:
Group_A    15.0
Group_B    30.0
Group_C     8.0
Group_D    45.0
dtype: float64

Sorted medians:
Group_C     8.0
Group_A    15.0
Group_B    30.0
Group_D    45.0
dtype: float64

Sorting in Descending Order

You can also sort the boxplots in descending order of median values ?

import pandas as pd
import matplotlib.pyplot as plt

# Create sample data
data = {
    'Group_A': [10, 15, 12, 18, 20, 14, 16],
    'Group_B': [25, 30, 28, 32, 35, 29, 31],
    'Group_C': [5, 8, 6, 9, 11, 7, 10],
    'Group_D': [40, 45, 42, 48, 50, 44, 46]
}

df = pd.DataFrame(data)

# Calculate and sort medians in descending order
medians = df.median().sort_values(ascending=False)
print("Medians sorted in descending order:")
print(medians)

# Reorder DataFrame and create boxplot
df_desc_sorted = df[medians.index]
df_desc_sorted.boxplot(figsize=(10, 6))
plt.title('Boxplot Sorted by Median Values (Descending)')
plt.ylabel('Values')
plt.show()
Medians sorted in descending order:
Group_D    45.0
Group_B    30.0
Group_A    15.0
Group_C     8.0
dtype: float64

Key Points

  • Use df.median() to calculate median values for each column

  • Use sort_values() to order medians in ascending or descending order

  • Reorder DataFrame columns using df[sorted_index]

  • The boxplot will display groups from left to right based on sorted median values

Conclusion

Sorting boxplots by median values helps in better data visualization and comparison. Calculate medians using df.median(), sort them with sort_values(), and reorder your DataFrame columns accordingly before plotting.

Updated on: 2026-03-26T15:02:38+05:30

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements