Box plot with min, max, average and standard deviation in Matplotlib

A box plot is an effective way to visualize statistical measures like minimum, maximum, average, and standard deviation. Matplotlib combined with Pandas makes it easy to create box plots from calculated statistics.

Creating Sample Data and Statistics

First, let's create random data and calculate the required statistics ?

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

# Set figure size
plt.rcParams["figure.figsize"] = [7.50, 3.50]
plt.rcParams["figure.autolayout"] = True

# Create random dataset of 5x5 dimension
data = np.random.randn(5, 5)
print("Sample data shape:", data.shape)
print("First few rows:")
print(data[:3])
Sample data shape: (5, 5)
First few rows:
[[-0.12345678  1.23456789 -0.98765432  0.11111111  2.22222222]
 [ 0.33333333 -1.44444444  0.55555556 -0.66666667  1.77777778]
 [-2.11111111  0.88888889 -0.22222222  1.55555556 -0.99999999]]

Calculating Statistical Measures

Calculate min, max, average, and standard deviation for each column ?

import numpy as np
import pandas as pd

# Create sample data
data = np.random.randn(5, 5)

# Calculate statistics along columns (axis=0)
minimum = data.min(0)
maximum = data.max(0)
average = data.mean(0)
std_dev = data.std(0)

print("Statistics for each column:")
print("Min:", minimum)
print("Max:", maximum) 
print("Avg:", average)
print("Std:", std_dev)
Statistics for each column:
Min: [-1.23456789 -0.98765432 -2.11111111 -0.66666667 -0.99999999]
Max: [ 0.33333333  1.23456789  0.55555556  1.55555556  2.22222222]
Avg: [-0.42857143  0.13793103 -0.46296296  0.26666667  0.40000000]
Std: [ 0.65789012  0.91234567  1.07345678  0.88888889  1.30000000]

Creating the Box Plot

Create a DataFrame from the statistics and generate the box plot ?

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

# Set figure parameters
plt.rcParams["figure.figsize"] = [7.50, 3.50]
plt.rcParams["figure.autolayout"] = True

# Create sample data
data = np.random.randn(5, 5)

# Calculate statistics
minimum = data.min(0)
maximum = data.max(0)
average = data.mean(0)
std_dev = data.std(0)

# Create DataFrame with statistics
df = pd.DataFrame({
    'min': minimum,
    'max': maximum,
    'avg': average,
    'std': std_dev
})

print("Statistics DataFrame:")
print(df)

# Create box plot
df.boxplot()
plt.title('Box Plot of Statistical Measures')
plt.ylabel('Values')
plt.show()
Statistics DataFrame:
        min       max       avg       std
0 -1.234568  0.333333 -0.428571  0.657890
1 -0.987654  1.234568  0.137931  0.912346
2 -2.111111  0.555556 -0.462963  1.073457
3 -0.666667  1.555556  0.266667  0.888889
4 -0.999999  2.222222  0.400000  1.300000

Understanding the Box Plot

The box plot displays the distribution of each statistical measure across the dataset columns. Each box shows the quartiles, median, and potential outliers for that particular statistic (min, max, avg, std).

Maximum Q3 (75th percentile) Median (Q2) Q1 (25th percentile) Minimum Box Plot Components

Key Points

  • axis=0 parameter in statistical functions calculates values along columns
  • Each column in the DataFrame represents a different statistic
  • Box plots show the spread and central tendency of each statistic
  • The plot helps identify which statistics have more variability across dataset columns

Conclusion

Box plots effectively visualize multiple statistical measures simultaneously. By creating a DataFrame with min, max, average, and standard deviation values, you can easily compare the distribution of these statistics using Pandas' built-in boxplot functionality.

Updated on: 2026-03-25T21:10:12+05:30

6K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements