Python Pandas - Bootstrap Plot
Bootstrap plots are useful visualization tool for estimating the uncertainty of a statistic, such as the mean, median, or mid-range, in a dataset. Which is done by repeatedly selecting random subsets of a specified size from the dataset, calculating the statistic for each sample, and displaying the results as plots and histograms.
Pandas provides a convenient function for Bootstrap plots, in this tutorial will learn how to use the bootstrap_plot() function to generate Bootstrap plots using Pandas.
The bootstrap_plot() Function
The plotting.bootstrap_plot() function in the Pandas library is useful for generating the Bootstrap plot on mean, median and mid-range statistics. This function returns a Matplotlib figure with the bootstrap plots for mean, median, and mid-range statistics.
Syntax
Following is the syntax of the bootstrap_plot() function −
pandas.plotting.bootstrap_plot(series, fig=None, size=50, samples=500, **kwds)
Where,
series: The Pandas Series containing the data.
fig: Optional, the Matplotlib Figure object. If not provided, a new figure is created.
size: The number of data points in each random subset (default is 50). It must be less than or equal to the length of the series.
samples: The number of bootstrap iterations (default is 500).
kwargs: : Additional options for customizing Matplotlib's plot.
Example: Basic Bootstrap plot
Here is the basic example of plotting the Bootstrap plot in Pandas using the plotting.bootstrap_plot() function.
import pandas as pd import numpy as np from pandas.plotting import bootstrap_plot import matplotlib.pyplot as plt # Create a random dataset data = pd.Series(np.random.uniform(size=100)) # Generate a basic bootstrap plot bootstrap_plot(data) plt.show()
On executing the above code, you will get the following plot −
Example: Custom Sample Size and Samples
Here is another example of using the plotting.Bootstrap_plot() function for plotting the Bootstrap plot for custom sample size and samples.
import pandas as pd import numpy as np from pandas.plotting import bootstrap_plot import matplotlib.pyplot as plt # Create a dataset data = pd.Series(np.random.normal(loc=50, scale=10, size=500)) # Generate a bootstrap plot with custom parameters bootstrap_plot(data, size=100, samples=1000) plt.show()
Following is the output of the above code −
Example: Bootstrap Plot Using the Iris Dataset
In this example, we will use the Iris dataset and generate a bootstrap plot for the "SepalWidth" column.
import pandas as pd import numpy as np from pandas.plotting import bootstrap_plot import matplotlib.pyplot as plt # Load the Iris dataset url = 'https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/iris.csv' data = pd.read_csv(url)['SepalWidth'] # Generate a bootstrap plot with custom parameters bootstrap_plot(data, size=100, samples=1000) plt.show()
Following is the output of the above code −