Introduction to Bootstrap plot


The visual depiction of data is essential in the field of data analysis. Bootstrap plots are extremely effective data visualisation tools that provide uncertainty estimates graphically. This article introduces the idea of bootstrap plots and provides clear instructions for making them in Python.

Unraveling the Concept of Bootstrap Plots

Bootstrap plots, which are often based on resampling with replacement, are graphic displays used to visualise uncertainty estimates. The bootstrap method uses many small data samples to average estimates from to estimate quantities about a population.

The relevant data are plotted on the x-axis of a bootstrap plot, and the bootstrapped 95% confidence interval for those values are plotted on the y-axis. This helps us understand the data's degree of uncertainty or fluctuation.

Generating Bootstrap Plots using Python

Bootstrap charts can be easily made using Python and its robust libraries, such as Seaborn and Matplotlib. A Matplotlib-based Python data visualisation library is called Seaborn. It offers a sophisticated drawing tool for creating eye-catching statistical visuals, such as bootstrap charts.

Plunging into Practical Examples

Let's look at some examples of how to make bootstrap graphs in Python to help with comprehension.

Example 1: Creating a Simple Bootstrap Plot

We must import the necessary libraries and load the dataset first.

import seaborn as sns
import matplotlib.pyplot as plt

# Load the 'tips' dataset from seaborn
tips = sns.load_dataset("tips")

Let's make a straightforward bootstrap plot for the dataset's "total_bill" column.

# Generate a bootstrap plot of the 'total_bill' column
sns.bootstrap_plot(tips['total_bill'], size=50, stat_func=sns.median)

# Display the plot
plt.show()

In this example, we create a bootstrap plot of the 'total_bill' column using Seaborn's bootstrap_plot function. The stat_func parameter indicates the statistical function to be applied to these samples, and the size argument specifies how many bootstrap samples should be generated.

Example 2: Bootstrap Plot with Customized Confidence Interval

Customising the confidence intervals used in the bootstrap plot is a common request. To accomplish this, create a special function and give it as the stat_func argument.

import numpy as np

# Define a function to calculate the 90% confidence interval
def ci_func(x, ci=90):
   lower = np.percentile(x, (100 - ci) / 2)
   upper = np.percentile(x, (100 + ci) / 2)
   return lower, upper

# Generate a bootstrap plot of the 'total_bill' column with a customized confidence interval
sns.bootstrap_plot(tips['total_bill'], size=50, stat_func=ci_func)

# Display the plot
plt.show()

In this example, the bootstrap_plot function receives the function ci_func, which generates the 90% confidence interval.

Example 3: Multiple Bootstrap Plots for Comparison

Comparing bootstrap graphs for several data subsets might be useful at times.

# Generate a bootstrap plot for each day of the week
for day in tips['day'].unique():
   sns.bootstrap_plot(tips[tips['day'] == day]['total_bill'], size=50, stat_func=sns.median)
   plt.title(day)
   plt.show()

With the help of this code, we can compare the "total_bill" for various days of the week by creating a separate bootstrap plot for each distinct day in the "day" column.

Limitations and Considerations

Even though bootstrap plots are effective tools, there are a few things to keep in mind. Even for tiny or skewed datasets, bootstrapping does not always offer a precise estimate of the uncertainty. The results must therefore be supported by the application of additional statistical tests.

Additionally, because bootstrapping requires frequent resampling, it can be computationally expensive for bigger datasets. The trade-off between processing resources and estimate precision must therefore be taken into account.

Conclusion

An easy-to-understand and reliable method of displaying the uncertainty surrounding statistical estimates is the bootstrap plot. They are a great tool for exploratory data analysis and help scientists and data analysts quickly understand their data.

Using Python's seaborn and matplotlib modules, we introduced bootstrap plots in this post and delved into several useful examples. Despite the fact that we touched on a number of important topics, there is still a great deal to learn about bootstrap plots and statistical data visualisation. To master making and reading bootstrap charts, keep practising and investigating various datasets.

Updated on: 17-Jul-2023

143 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements