Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Box plot with min, max, average and standard deviation in Matplotlib
A box plot is an effective way to visualize statistical measures like minimum, maximum, average, and standard deviation. Matplotlib combined with Pandas makes it easy to create box plots from calculated statistics.
Creating Sample Data and Statistics
First, let's create random data and calculate the required statistics ?
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
# Set figure size
plt.rcParams["figure.figsize"] = [7.50, 3.50]
plt.rcParams["figure.autolayout"] = True
# Create random dataset of 5x5 dimension
data = np.random.randn(5, 5)
print("Sample data shape:", data.shape)
print("First few rows:")
print(data[:3])
Sample data shape: (5, 5) First few rows: [[-0.12345678 1.23456789 -0.98765432 0.11111111 2.22222222] [ 0.33333333 -1.44444444 0.55555556 -0.66666667 1.77777778] [-2.11111111 0.88888889 -0.22222222 1.55555556 -0.99999999]]
Calculating Statistical Measures
Calculate min, max, average, and standard deviation for each column ?
import numpy as np
import pandas as pd
# Create sample data
data = np.random.randn(5, 5)
# Calculate statistics along columns (axis=0)
minimum = data.min(0)
maximum = data.max(0)
average = data.mean(0)
std_dev = data.std(0)
print("Statistics for each column:")
print("Min:", minimum)
print("Max:", maximum)
print("Avg:", average)
print("Std:", std_dev)
Statistics for each column: Min: [-1.23456789 -0.98765432 -2.11111111 -0.66666667 -0.99999999] Max: [ 0.33333333 1.23456789 0.55555556 1.55555556 2.22222222] Avg: [-0.42857143 0.13793103 -0.46296296 0.26666667 0.40000000] Std: [ 0.65789012 0.91234567 1.07345678 0.88888889 1.30000000]
Creating the Box Plot
Create a DataFrame from the statistics and generate the box plot ?
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
# Set figure parameters
plt.rcParams["figure.figsize"] = [7.50, 3.50]
plt.rcParams["figure.autolayout"] = True
# Create sample data
data = np.random.randn(5, 5)
# Calculate statistics
minimum = data.min(0)
maximum = data.max(0)
average = data.mean(0)
std_dev = data.std(0)
# Create DataFrame with statistics
df = pd.DataFrame({
'min': minimum,
'max': maximum,
'avg': average,
'std': std_dev
})
print("Statistics DataFrame:")
print(df)
# Create box plot
df.boxplot()
plt.title('Box Plot of Statistical Measures')
plt.ylabel('Values')
plt.show()
Statistics DataFrame:
min max avg std
0 -1.234568 0.333333 -0.428571 0.657890
1 -0.987654 1.234568 0.137931 0.912346
2 -2.111111 0.555556 -0.462963 1.073457
3 -0.666667 1.555556 0.266667 0.888889
4 -0.999999 2.222222 0.400000 1.300000
Understanding the Box Plot
The box plot displays the distribution of each statistical measure across the dataset columns. Each box shows the quartiles, median, and potential outliers for that particular statistic (min, max, avg, std).
Key Points
- axis=0 parameter in statistical functions calculates values along columns
- Each column in the DataFrame represents a different statistic
- Box plots show the spread and central tendency of each statistic
- The plot helps identify which statistics have more variability across dataset columns
Conclusion
Box plots effectively visualize multiple statistical measures simultaneously. By creating a DataFrame with min, max, average, and standard deviation values, you can easily compare the distribution of these statistics using Pandas' built-in boxplot functionality.
