Python Pandas - Bootstrap Plot



Bootstrap plots are useful visualization tool for estimating the uncertainty of a statistic, such as the mean, median, or mid-range, in a dataset. Which is done by repeatedly selecting random subsets of a specified size from the dataset, calculating the statistic for each sample, and displaying the results as plots and histograms.

Pandas provides a convenient function for Bootstrap plots, in this tutorial will learn how to use the bootstrap_plot() function to generate Bootstrap plots using Pandas.

The bootstrap_plot() Function

The plotting.bootstrap_plot() function in the Pandas library is useful for generating the Bootstrap plot on mean, median and mid-range statistics. This function returns a Matplotlib figure with the bootstrap plots for mean, median, and mid-range statistics.

Syntax

Following is the syntax of the bootstrap_plot() function −

pandas.plotting.bootstrap_plot(series, fig=None, size=50, samples=500, **kwds)

Where,

  • series: The Pandas Series containing the data.

  • fig: Optional, the Matplotlib Figure object. If not provided, a new figure is created.

  • size: The number of data points in each random subset (default is 50). It must be less than or equal to the length of the series.

  • samples: The number of bootstrap iterations (default is 500).

  • kwargs: : Additional options for customizing Matplotlib's plot.

Example: Basic Bootstrap plot

Here is the basic example of plotting the Bootstrap plot in Pandas using the plotting.bootstrap_plot() function.

import pandas as pd
import numpy as np
from pandas.plotting import bootstrap_plot
import matplotlib.pyplot as plt

# Create a random dataset
data = pd.Series(np.random.uniform(size=100))

# Generate a basic bootstrap plot
bootstrap_plot(data)
plt.show()

On executing the above code, you will get the following plot −

Basic Bootstrap plot

Example: Custom Sample Size and Samples

Here is another example of using the plotting.Bootstrap_plot() function for plotting the Bootstrap plot for custom sample size and samples.

import pandas as pd
import numpy as np
from pandas.plotting import bootstrap_plot
import matplotlib.pyplot as plt

# Create a dataset
data = pd.Series(np.random.normal(loc=50, scale=10, size=500))

# Generate a bootstrap plot with custom parameters
bootstrap_plot(data, size=100, samples=1000)
plt.show()

Following is the output of the above code −

Bootstrap plot with Custom Parameters

Example: Bootstrap Plot Using the Iris Dataset

In this example, we will use the Iris dataset and generate a bootstrap plot for the "SepalWidth" column.

import pandas as pd
import numpy as np
from pandas.plotting import bootstrap_plot
import matplotlib.pyplot as plt

# Load the Iris dataset
url = 'https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/iris.csv'
data = pd.read_csv(url)['SepalWidth']

# Generate a bootstrap plot with custom parameters
bootstrap_plot(data, size=100, samples=1000)
plt.show()

Following is the output of the above code −

Bootstrap plot for Iris Dataset
Advertisements