Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Gaussian fit using Python
Data analysis and visualization are crucial nowadays, where data is the new oil. Typically data analysis involves feeding the data into mathematical models and extracting useful information. The Gaussian fit is a powerful mathematical model that data scientists use to model data based on a bell-shaped curve. In this article, we will understand Gaussian fit and how to implement it using Python.
What is Gaussian Fit
A bell-shaped curve characterizes the Gaussian distribution. The bell-shaped curve is symmetrical around the mean (?). We define a probability density function as follows:
f(x) = (1 / (? * sqrt(2?))) * exp(-(x - ?)² / (2 * ?²))
Here ? represents the standard deviation of the distribution, ? is the mean, and ? (pi) is a mathematical constant with value approximately 3.14159.
To fit any data into the Gaussian distribution, we must estimate the optimal values of ? and ?. Doing this manually would be tedious and error-prone. Fortunately, Python provides built-in libraries and functions to handle this efficiently.
Visualizing the Bell Curve
Before implementing Gaussian fitting, let's visualize a typical bell curve. This will provide better intuition when working with Gaussian distributions.
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
# Generate data points
x_data = np.arange(-100, 100, 0.01)
# Calculate probability density function with mean=25, std=20
y_data = stats.norm.pdf(x_data, 25, 20)
plt.plot(x_data, y_data)
plt.title("Gaussian Distribution (Bell Curve)")
plt.xlabel("Value of x")
plt.ylabel("Probability Density")
plt.grid(True, alpha=0.3)
plt.show()
The output shows the characteristic bell shape:
A bell-shaped curve centered at x=25, with values around the mean being more probable than extreme values.
Using curve_fit for Gaussian Fitting
The main goal of Gaussian fitting is to find optimal values of ? and ? that best fit your data. Here's the step-by-step process:
Step 1: Define the Gaussian Function
import numpy as np
def gaussian(x, mu, sigma):
"""Gaussian probability density function"""
return (1 / (sigma * np.sqrt(2 * np.pi))) * np.exp(-((x - mu) ** 2) / (2 * sigma ** 2))
Step 2: Complete Gaussian Fitting Example
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def gaussian(x, mu, sigma):
return (1 / (sigma * np.sqrt(2 * np.pi))) * np.exp(-((x - mu) ** 2) / (2 * sigma ** 2))
# Generate sample data with noise
np.random.seed(42) # For reproducible results
x_data = np.linspace(-5, 5, 100)
y_data = gaussian(x_data, 0, 1) + np.random.normal(0, 0.05, 100)
# Perform Gaussian fit
popt, pcov = curve_fit(gaussian, x_data, y_data)
mu_fit, sigma_fit = popt
# Generate fitted curve
y_fit = gaussian(x_data, mu_fit, sigma_fit)
# Plot results
plt.figure(figsize=(10, 6))
plt.scatter(x_data, y_data, alpha=0.6, label='Noisy Data', color='blue')
plt.plot(x_data, y_fit, 'r-', linewidth=2, label=f'Fitted Curve (?={mu_fit:.2f}, ?={sigma_fit:.2f})')
plt.plot(x_data, gaussian(x_data, 0, 1), 'g--', label='True Curve (?=0, ?=1)')
plt.xlabel('x')
plt.ylabel('y')
plt.title('Gaussian Curve Fitting')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
print(f"Fitted parameters: ? = {mu_fit:.3f}, ? = {sigma_fit:.3f}")
The output shows:
Fitted parameters: ? = 0.023, ? = 1.021
Key Parameters Explained
| Parameter | Symbol | Description |
|---|---|---|
| Mean | ? (mu) | Center of the distribution |
| Standard Deviation | ? (sigma) | Spread/width of the distribution |
| Amplitude | A | Height of the peak (optional scaling factor) |
Real-World Application Example
Let's fit a Gaussian to measurement data that might contain experimental noise:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def gaussian_with_amplitude(x, amplitude, mu, sigma):
return amplitude * np.exp(-((x - mu) ** 2) / (2 * sigma ** 2))
# Simulate measurement data
np.random.seed(123)
x_measured = np.linspace(0, 10, 50)
y_true = 5 * np.exp(-((x_measured - 5) ** 2) / (2 * 1.5 ** 2))
y_measured = y_true + np.random.normal(0, 0.3, len(x_measured))
# Fit the data
initial_guess = [5, 5, 1] # [amplitude, mu, sigma]
popt, pcov = curve_fit(gaussian_with_amplitude, x_measured, y_measured, p0=initial_guess)
amp_fit, mu_fit, sigma_fit = popt
y_fitted = gaussian_with_amplitude(x_measured, amp_fit, mu_fit, sigma_fit)
# Plot results
plt.figure(figsize=(10, 6))
plt.scatter(x_measured, y_measured, alpha=0.7, label='Measured Data', color='red')
plt.plot(x_measured, y_fitted, 'b-', linewidth=2, label='Gaussian Fit')
plt.xlabel('Position')
plt.ylabel('Intensity')
plt.title('Gaussian Fitting to Experimental Data')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
print(f"Fitted parameters:")
print(f"Amplitude: {amp_fit:.2f}")
print(f"Mean (?): {mu_fit:.2f}")
print(f"Standard Deviation (?): {sigma_fit:.2f}")
The output shows:
Fitted parameters: Amplitude: 4.89 Mean (?): 5.02 Standard Deviation (?): 1.48
Conclusion
Gaussian fitting is a fundamental technique in data analysis for modeling bell-shaped distributions. Python's scipy.optimize.curve_fit function makes it straightforward to estimate optimal parameters (?, ?) from noisy data. This technique is widely used in physics, engineering, and data science for signal processing, measurement analysis, and statistical modeling.
