Box Plot Explained: Interpretation, Examples, & Comparison (2024)

In descriptive statistics, a box plot or boxplot (also known as a box and whisker plot) is a type of chart often used in explanatory data analysis. Box plots visually show the distribution of numerical data and skewness by displaying the data quartiles (or percentiles) and averages.

Box plots show the five-number summary of a set of data: including the minimum score, first (lower) quartile, median, third (upper) quartile, and maximum score.

Box Plot Explained: Interpretation, Examples, & Comparison (1)

Definitions

Minimum Score

The lowest score, excluding outliers (shown at the end of the left whisker).

Lower Quartile

Twenty-five percent of scores fall below the lower quartile value (also known as the first quartile).

Median

The median marks the mid-point of the data and is shown by the line that divides the box into two parts (sometimes known as the second quartile). Half the scores are greater than or equal to this value, and half are less.

Upper Quartile

Seventy-five percent of the scores fall below the upper quartile value (also known as the third quartile). Thus, 25% of data are above this value.

Maximum Score

The highest score, excluding outliers (shown at the end of the right whisker).

Whiskers

The upper and lower whiskers represent scores outside the middle 50% (i.e., the lower 25% of scores and the upper 25% of scores).

The Interquartile Range (or IQR)

The box plot shows the middle 50% of scores (i.e., the range between the 25th and 75th percentile).

Why are box plots useful?

Box plots divide the data into sections containing approximately 25% of the data in that set.

Box Plot Explained: Interpretation, Examples, & Comparison (2)

Box plots are useful as they provide a visual summary of the data enabling researchers to quickly identify mean values, the dispersion of the data set, and signs of skewness.

Note that the image above represents data that has a perfect normal distribution, and most box plots will not conform to this symmetry (where each quartile is the same length).

Box plots are useful as they show the average score of a data set

The median is the average value from a set of data and is shown by the line that divides the box into two parts. Half the scores are greater than or equal to this value, and half are less.

Box plots are useful as they show the skewness of a data set

The box plot shape will show if a statistical data set is normally distributed or skewed.

Box Plot Explained: Interpretation, Examples, & Comparison (3)

When the median is in the middle of the box, and the whiskers are about the same on both sides of the box, then the distribution is symmetric.

When the median is closer to the bottom of the box, and if the whisker is shorter on the lower end of the box, then the distribution is positively skewed (skewed right).

When the median is closer to the top of the box, and if the whisker is shorter on the upper end of the box, then the distribution is negatively skewed (skewed left).

Box plots are useful as they show the dispersion of a data set

In statistics, dispersion (also called variability, scatter, or spread) is the extent to which a distribution is stretched or squeezed.

The smallest and largest values are found at the end of the ‘whiskers’ and are useful for providing a visual indicator regarding the spread of scores (e.g., the range).

Box Plot Explained: Interpretation, Examples, & Comparison (4)

The interquartile range (IQR) is the box plot showing the middle 50% of scores and can be calculated by subtracting the lower quartile from the upper quartile (e.g., Q3−Q1).

Box plots are useful as they show outliers within a data set

An outlier is an observation that is numerically distant from the rest of the data.

When reviewing a box plot, an outlier is defined as a data point that is located outside the whiskers of the box plot.

Box Plot Explained: Interpretation, Examples, & Comparison (5)

Source: https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51

For example, outside 1.5 times the interquartile range above the upper quartile and below the lower quartile (Q1 – 1.5 * IQR or Q3 + 1.5 * IQR).

How to compare box plots

Box plots are a useful way to visualize differences among different samples or groups. They manage to provide a lot of statistical information, including — medians, ranges, and outliers.

Note although box plots have been presented horizontally in this article, it is more common to view them vertically in research papers.

Step 1: Compare the medians of box plots

Compare the respective medians of each box plot. If the median line of a box plot lies outside of the box of a comparison box plot, then there is likely to be a difference between the two groups.

Box Plot Explained: Interpretation, Examples, & Comparison (6)

Source: https://blog.bioturing.com/2018/05/22/how-to-compare-box-plots/

Step 2: Compare the interquartile ranges and whiskers of box plots

Compare the interquartile ranges (that is, the box lengths) to examine how the data is dispersed between each sample. The longer the box, the more dispersed the data. The smaller, the less dispersed the data.

Box Plot Explained: Interpretation, Examples, & Comparison (7)

Next, look at the overall spread as shown by the extreme values at the end of two whiskers. This shows the range of scores (another type of dispersion). Larger ranges indicate wider distribution, that is, more scattered data.

Step 3: Look for potential outliers (see the above image)

When reviewing a box plot, an outlier is defined as a data point that is located outside the whiskers of the box plot.

Step 4: Look for signs of skewness

If the data do not appear to be symmetric, does each sample show the same kind of asymmetry?

Box Plot Explained: Interpretation, Examples, & Comparison (8)

Box Plot Explained: Interpretation, Examples, & Comparison (9)

Box Plot Explained: Interpretation, Examples, & Comparison (2024)

FAQs

How to interprete a box plot? ›

Interpreting a box and whiskers

Construction of a box plot is based around a dataset's quartiles, or the values that divide the dataset into equal fourths. The first quartile (Q1) is greater than 25% of the data and less than the other 75%. The second quartile (Q2) sits in the middle, dividing the data in half.

How do you describe a box plot example? ›

A box and whisker plot—also called a box plot—displays the five-number summary of a set of data. The five-number summary is the minimum, first quartile, median, third quartile, and maximum. In a box plot, we draw a box from the first quartile to the third quartile. A vertical line goes through the box at the median.

What to say when comparing box plots? ›

Guidelines for comparing boxplots

Compare the respective medians, to compare location. Compare the interquartile ranges (that is, the box lengths), to compare dispersion. Look at the overall spread as shown by the adjacent values. (This is another aspect of dispersion.)

How do you interpret the significance of a box plot? ›

  1. Step 1: Assess the key characteristics. Examine the center and spread of the distribution. ...
  2. Step 2: Look for indicators of nonnormal or unusual data. Skewed data indicate that data may be nonnormal. ...
  3. Step 3: Assess and compare groups. If your boxplot has groups, assess and compare the center and spread of groups.

How do you interpret a plot? ›

You interpret a scatterplot by looking for trends in the data as you go from left to right: If the data show an uphill pattern as you move from left to right, this indicates a positive relationship between X and Y. As the X-values increase (move right), the Y-values tend to increase (move up).

How to interpret the shape of a box plot? ›

The box length gives an indication of the sample variability and the line across the box shows where the sample is centred. The position of the box in its whiskers and the position of the line in the box also tells us whether the sample is symmetric or skewed, either to the right or left.

How to interpret box plot outliers? ›

If the data do not extend to the end of the whiskers, then the whiskers extend to the minimum and maximum data values. If there are values that fall above or below the end of the whiskers, they are plotted as dots. These points are often called outliers. An outlier is more extreme than the expected variation.

How do you interpret a positively skewed box plot? ›

In data that's positively skewed, the long tail is in the positive direction. Specifically, it means the mean is greater than the mode. And in data that is negatively skewed, the long tail is in the negative direction. And technically, we say that the mode is greater than the mean.

How to describe the spread of a boxplot? ›

A boxplot represents spread in two ways: by conveying the interquartile range (IQR) and the range. The interquartile range (IQR) is the difference between the third and first quartiles. The box of a boxplot spans from the first quartile to the third quartile (with the line intersecting the box marking the median).

How do you interpret a box and whisker plot? ›

A plot with long whiskers represents a greater range for the overall sample than simply a longer box itself does. Data covering a greater range is naturally less reliable as an indicator of highly probable values, but given the option, longer whiskers are less of a concern than a long box.

How to interpret side by side box plots? ›

Side-by-side box plots can be used along with mean and median differences to assess whether a quantitative variable and a categorical variable are associated. More overlap in the box plots indicates less association while less overlap in the box plots indicates a stronger association.

What insights can you derive from comparing the two box plots? ›

Compare box plots can by comparing medians so if one box plot has a higher median, it suggests that the values in that group tend to be higher. Also we can compare the box lengths so longer boxes indicate a larger spread of the middle 50% of the data and it shows insights into the variability of each group.

What is a comparative boxplot? ›

It is used to compare multiple sets of data describing the same, single variable. It uses separate box plots for each data set. It allows comparisons of the median (center), upper and lower extremes, quartiles, interquartile range (IQR), and range between and among multiple data sets.

What are the comparison circles in a box plot? ›

The drawing of comparison circles is a way to display whether or not the mean values for various categories (boxes in the box plot) are significantly different from each other. The circles are drawn with their centers at the mean value for the box to which it corresponds.

How to compare variability in box plots? ›

Short boxes mean their data points consistently hover around the center values. Taller boxes imply more variable data. That's something to look for when comparing box plots, especially when the medians are similar. Wider ranges (whisker length, box size) indicate more variable data.

How do you compare box plots in math? ›

When comparing two box plots, you should make a comment about: The average (the median) – i.e. which is higher/larger on average; The spread or consistency (the interquartile range or IQR) – a greater IQR means that data points are more spread out, and therefore less consistent.

References

Top Articles
Latest Posts
Article information

Author: Gregorio Kreiger

Last Updated:

Views: 6307

Rating: 4.7 / 5 (77 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Gregorio Kreiger

Birthday: 1994-12-18

Address: 89212 Tracey Ramp, Sunside, MT 08453-0951

Phone: +9014805370218

Job: Customer Designer

Hobby: Mountain biking, Orienteering, Hiking, Sewing, Backpacking, Mushroom hunting, Backpacking

Introduction: My name is Gregorio Kreiger, I am a tender, brainy, enthusiastic, combative, agreeable, gentle, gentle person who loves writing and wants to share my knowledge and understanding with you.