Statistical Simulation in Python


Statistical simulation is the task of making use of computer based methods in order to generate random samples from a probability distribution so that we can model and analyse complex systems which exhibit random behaviour. In this article we are going to see how to make use of this powerful tool in Python to make predictions, generate insights as well as evaluate the performance of statistical algorithms.

There are different types of statistical simulations, which are as follows:

  • Monte Carlo simulations − Generation of random samples from a probability distribution in order to estimate the expected value of a function.

  • Bootstrap − Resampling technique often used for estimating the sampling distribution of an estimator.

  • Markov chain Monte Carlo (MCMC) − Class of algorithms that is used to estimate the parameters of complex probability distributions.

  • Stochastic processes simulations − Simulation of random behaviour over time. Stock prices or weather patterns predictions are some examples.

Statistical simulation is used in a wide range of fields, including finance, engineering, physics, biology, and social sciences, to model and analyse complex systems, make predictions, and generate insights. The use of computer-based methods allows for the efficient and accurate simulation of large numbers of scenarios, which can be used to quantify uncertainty, estimate risk, and make data-driven decisions.

Monte Carlo simulations

Monte Carlo simulations are a statistical simulation which involves generating random samples from a probability distribution in order to estimate the expected value of a function or to model and analyse complex systems which exhibit random behaviour. The name "Monte Carlo" is derived from the Monte Carlo Casino which is located in Monaco, where randomness is the key aspect in games of chance, to be precise in gambling.

The accuracy of a Monte Carlo simulation depends on the number of random samples generated and the quality of the model or system being analyzed. With enough samples and a good model, Monte Carlo simulations can provide valuable insights and help decision-makers make informed choices.

Example

import numpy as np

# Define the function to be evaluated
def function(x):
   return x**2

# Generate random samples from a uniform distribution between 0 and 1
samples = np.random.uniform(0, 1, size=10000)

# Evaluate the function at each sample
values = function(samples)

# Compute the average of function values
mean_value = np.mean(values)

print("Mean value of the function: ", mean_value)

Output

Mean value of the function:  0.3326046914715845

The code above demonstrates a very simple example of how we can make use of the Monte Carlo simulation to estimate the mean value of a function. By generating a large number of random samples and then evaluating the function at each sample, we can easily obtain an approximate value of the mean. The accuracy of the estimate will largely depend upon the number of samples being generated as well as the complexity of the function which we are going to be evaluating.

Bootstrap

Bootstrap is a statistical method for estimating the sampling distribution of an estimator by resampling the data with replacement. It is a powerful technique that can be used to estimate the variability of an estimator and to construct confidence intervals. The method was initially introduced by Mr Bradley Efron in 1979.

Bootstrap is particularly useful when the sample size is small, or the population distribution is unknown or complex, and traditional methods such as the central limit theorem are not applicable.

The basic steps of bootstrap are:

  • Collect a sample of data.

  • Drawing an enormous number of bootstrap samples (with replacement) from the original sample.

  • Calculate the estimator of interest (e.g. mean, standard deviation, etc.) for each bootstrap sample.

  • Use the distribution of the estimator calculated from the bootstrap samples to make inferences about the population.

Bootstrap can be applied to a wide range of estimators, including means, standard deviations, regression coefficients, and more. It can also be used to estimate the distribution of test statistics, such as the t-statistic, which can be used to test hypotheses about population parameters.

Here is an example of how to use the bootstrap method to estimate the standard deviation of a sample of data and construct a 95% confidence interval.

Example

import numpy as np

# original sample
data = [1, 2, 3, 4, 5]

# number of bootstrap samples
n_samples = 1000

# array to store the bootstrap samples standard deviation
std_devs = np.empty(n_samples)

# generate bootstrap samples
for i in range(n_samples):
   sample = np.random.choice(data, size=len(data), replace=True)
   std_devs[i] = np.std(sample)

# calculating the lower as well as upper bound of the confidence interval
alpha = 0.05
lower = np.percentile(std_devs, alpha/2*100)
upper = np.percentile(std_devs, (1-alpha/2)*100)

print(f'Confidence interval: [{lower}, {upper}]')

Output

Confidence interval: [0.4898979485566356, 1.7435595774162693]

In this example, we are drawing 1000 bootstrap samples from the original sample and calculating the standard deviation of each sample. Then, we use the distribution of standard deviations to calculate the lower and upper bounds of the 95% confidence interval using the np.percentile() function.

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) is a class of algorithms used to estimate the parameters of complex probability distributions. These algorithms construct a Markov chain that has the desired probability distribution as its equilibrium distribution. By simulating the chain, it is possible to generate samples from this distribution, which can then be used for statistical inference.

The basic idea of MCMC is to construct a Markov chain that has the target distribution as its stationary distribution. The chain is then run for many steps, and samples are collected at regular intervals. These samples are then used to estimate the parameters of the target distribution.

Here is an example of how to use the Metropolis-Hastings algorithm (sub part of MCMC) to sample from a normal distribution.

Example

import numpy as np

# target distribution
mean = 0
std = 1
target = lambda x: 1/(std * np.sqrt(2 * np.pi)) * np.exp(-(x - mean)**2 / (2 * std**2))

# proposal distribution
proposal_mean = 0
proposal_std = 2
proposal = lambda x: 1/(proposal_std * np.sqrt(2 * np.pi)) * np.exp(-(x - proposal_mean)**2 / (2 * proposal_std**2))

# initial state
x = 0

# number of samples
n_samples = 10000

# array to store the samples
samples = [x]

# Metropolis-Hastings algorithm
for i in range(n_samples):
   x_new = np.random.normal(x, proposal_std)
   acceptance_prob = min(1, target(x_new) / target(x))
   if np.random.rand() < acceptance_prob:
      x = x_new
   samples.append(x)

# plot the samples
import matplotlib.pyplot as plt
plt.hist(samples, bins=50, density=True)
plt.show()

Output

This script uses the Metropolis-Hastings algorithm to generate a sequence of samples from a normal distribution with mean 0 and standard deviation 1, using a normal proposal distribution with mean 0 and standard deviation 2. The algorithm starts with an initial state, in this case 0 and then generates a new state by proposing a new value from the proposal distribution. The new value is accepted with a probability determined by the acceptance probability, which is calculated based on the ratio of the target and proposal distributions.

Stochastic Processes Simulations

Stochastic processes simulations are a class of statistical simulations that involve the simulation of random behaviour over time. They are used to model and analyse complex systems that exhibit randomness, such as stock prices, weather patterns, and biological populations.

A stochastic process is a mathematical model that describes a system whose behaviour is influenced by randomness.

Example

import numpy as np
import matplotlib.pyplot as plt

# Parameters
p = 0.5  # probability of coin being a head
T = 10  # defining the number of time steps

# Initial state
x = 1

# Random number generator
np.random.seed(0)

# Array to store the states
states = [x]

# Simple stochastic process simulation
for t in range(T):
    x = 1 if np.random.rand() < p else 0
    states.append(x)

print(states)

# Plot the states
plt.step(range(T+1), states, where='post')
plt.xlabel('Time')
plt.ylabel('State')
plt.ylim(-0.5,1.5)
plt.show()

Output

[1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1]

Other Basic Examples

Here is an example for simulating the rolling a dice.

Example

import numpy as np

# Generate random samples from a uniform distribution between 1 and 6
samples = np.random.randint(1, 7, size=10000)

# Compute the average and the standard deviation of the samples
sample_mean = np.mean(samples)
sample_std = np.std(samples)

print("Sample mean: ", sample_mean)
print("Sample standard deviation: ", sample_std)

Output

Sample mean:  3.4946
Sample standard deviation:  1.7094358250604205

Conclusion

Statistical simulations are a powerful tool which help understand and analyze complex systems and processes. Python provides us with a range of tools and libraries that make it easy to implement and perform statistical simulations.

The main benefit of statistical simulation is its ability to not only model complex systems and processes, but also to investigate the effects of different parameters and variables on the outcome.It's applications are helpful to almost all fields.

NumPy, SciPy, and Pandas are examples of some libraries which are useful for generating random samples, evaluating functions, and performing statistical analysis.

Overall, statistical simulation is an essential tool for anyone working in data science or related fields, and Python provides a flexible and powerful platform for implementing these simulations. By harnessing the power of statistical simulation, we can gain new insights into complex systems and processes and make more informed decisions based on our analysis.

Updated on: 04-Oct-2023

192 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements