Box and Whisker Plot


Introduction

Box and whisker plots can be used to display and analyze data . More detailed details in various distributions or datasets are sometimes required, which measures of any central tendency such as mean, median, and mode may not meet. The variability or dispersion of data necessitates a much more concrete foundation. A box and whiskers plot can meet this demand.

A box and whisker plot is a method of abstracting a set of data that is estimated using an interval scale. It is also known as a box plot. These are mostly used to interpret data. It is a type of graphical method that displays the variation of data in a dataset.

In this tutorial, we 1 ill discuss the whisker and box plot.

Definition

  • A box plot is a plot that gives us a much clearer idea of how the values in the data should be distributed. Box plots extend their lines from the boxes, which are commonly referred to as whiskers.

  • Whiskers are used to denote variation outside of the upper and lower quartiles. One of the Box plot’s characteristics is that it is non-parametric. And this feature of the Box plot actually aids in displaying a variation of a statistical population in samples where no assumptions about the underlying statistical distribution are made.

  • The gaps in the box represent the degree of dispersion (spread) and skewness in the data, as well as the presence of outliers.

  • Box plots can be drawn in either horizontal or vertical directions. A box plot is a chart that is commonly used in exploratory data analysis.

How to draw

The box and whiskers plot can be made in five simple steps. To create a box and whisker diagram, we must first determine −

  • Step 1 − The minimum value is the smallest value in the data.

  • Step 2 − The first quartile is defined as the value that is less than the lower 25% of the data.

  • Step 3 − Determine the median value from the provided data set.

  • Step 4 − The value in the third quartile is greater than the value in the lower 25%.

  • Step 5 − Largest value is the maximum value of given data set.

Types

A boxplot (or boxplot) is a simple way of plotting the distribution of data across quartiles. This is a graphical representation of statistical data based on lowest, first, middle, third and highest values.

Let us examine these five box plot components.

Median Value

A value or quantity in the middle of a range of values ordered in ascending or descending order. If the set has an odd number of values, the median is right in the middle. If the number of values is even, the median is calculated by averaging the two values closest to the center.

The Lower Quartile

The data are divided into the bottom 25% by the bottom quartile (also known as the first quartile). Quartiles are three data points that divide information records into four equal groups. Each group represents one quarter of the entire dataset. The middle lower half value is the lower quartile.

Upper Quartile

The third quartile is another name for the upper quartile. Split the data into the bottom 75% (or top 25%). It is also the average of the top half.

Interquartile Range

Represents the difference between the lower and upper quartile. IQR is usually considered a better measure of spread than range because it is not affected by outliers (highest-lowest).

Highest Value

This boxplot point represents the highest non-outlier value in the data distribution for which the boxplot is produced. Does not match the maximum value of the dataset.

Lowest Value

This boxplot point is not an outlier because it represents the minimum value of the data distribution (minimum interquartile range of the distribution) used to create the boxplot. No longer matches the minimum value of the dataset.

Solved Examples

1) Assume a computer firm has two locations. Each month, the corporate tracked the number of sales made by each store. We sold the following number of computers in the last 12 months.

First store − 𝟑𝟓𝟎, 𝟒𝟔𝟎, 𝟐𝟎, 𝟏𝟔𝟎, 𝟓𝟖𝟎, 𝟐𝟓𝟎, 𝟐𝟏𝟎, 𝟏𝟐𝟎, 𝟐𝟎𝟎, 𝟓𝟏𝟎, 𝟐𝟗𝟎, 𝟑𝟖𝟎.

Second store − 𝟓𝟐𝟎, 𝟏𝟖𝟎, 𝟐𝟔𝟎, 𝟑𝟖𝟎, 𝟖𝟎, 𝟓𝟎𝟎, 𝟔𝟑𝟎, 𝟒𝟐𝟎, 𝟐𝟏𝟎, 𝟕𝟎, 𝟒𝟒𝟎, 𝟏𝟒𝟎.

Answer − Create two boxplots, one for store 1 and one for store 2, to compare the sales performance of the two stores.

First, sort the data points in ascending order.

20, 120 , 160 , 200, 210, 290 , 350 , 380 , 460 , 510, 580.

Now we must compute the median. This, on the other hand, is a well-balanced set of data. There is no single point in the middle. In our case, the sixth and seventh data points, 250 and 290, represent the middle.

In an even data set, the median is calculated as follows −

$$\mathrm{Median\:=\:\frac{250\:+\:290}{2}\:=\:270}$$

Consider what happens when you use the lower and upper quartiles of an even data set: Six figures are lower than the median − 20 , 120, 160 , 200, 210 , 𝑎𝑛𝑑 250.

The median of these six items is the lower quartile, so $\mathrm{=\:\frac{(160\:+\:200)}{2}\:=\:180}$

There are also six numbers that are higher than the median − 290, 350, 380, 460, 510 580.

The median of these six data points is the upper quartile=420

Finally, the sales for Store 1 are summarised by five numbers − 20, 180, 270, 420, 𝑎𝑛𝑑 580.

The five-number summary for Store 2 is produced using the same calculations.70, 160, 320, 470, 𝑎𝑛𝑑 630

We're almost done with our comparative box and whisker plot −

Results: The interquartile range of Store 2 is larger. These findings indicate that Store 2 consistently outsells Store 1.

Conclusion

In this tutorial, we learned about Box and Whisker Plots, their properties and importance. Box and whisker plots can be used to display and analyze data. They contain a number of critical parameters that must be investigated further. In addition. Multiple sets of data can be represented in the same graph.

FAQs

1. What do you mean by box and whisker plot?

A boxplot is a type of chart that provides an overview of five numbers for a given data set. Minimum, lower quartile, median, upper quartile, and maximum

2. What is the box plot's five-number summary?

The 5-digit summaries of the boxplot are minimum, maximum, median, 1st quartile, and 3rd quartile.

3. When is the box plot said to be symmetric?

If the median is equidistant from the minimum and maximum values, the box plot is said to be symmetric.

4. What are the drawbacks of using the Box and Whisker Plot?

Boxplots have the disadvantage of hiding multimodality and other distributional features. Mean is difficult to localize and can leave viewers at a loss.

5. What exactly is an outlier in a box plot?

Outliers are data points that are numerically different from the rest of the data in the dataset and are outside the box and whisker plots.

Updated on: 26-Apr-2024

19 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements