Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Gradient Descent in Linear Regression
Gradient descent is a powerful optimization algorithm used to minimize the cost function in machine learning models, particularly in linear regression. It works by iteratively adjusting model parameters in the direction of steepest descent to find the optimal values that minimize prediction errors.
Linear regression models the relationship between variables by finding the best-fit line, while gradient descent provides the mechanism to efficiently discover the optimal parameters for this line. Together, they form a fundamental building block of machine learning and predictive modeling.
Understanding Linear Regression
Linear regression models the relationship between a dependent variable and one or more independent variables by fitting a linear equation to observed data. The simple linear regression equation is:
Where:
- y ? the dependent variable
- x ? the independent variable
- ?? ? y-intercept (value of y when x is 0)
- ?? ? slope (change in y for one-unit increase in x)
- ? ? random error term
The goal is to minimize the difference between predicted values (?) and actual values (y). The most commonly used cost function is Mean Squared Error (MSE):
Understanding Gradient Descent
Gradient descent is an iterative optimization algorithm that minimizes the cost function by moving parameters in the direction of steepest descent. The mathematical representation is:
Where:
- ? ? model parameters
- ? ? learning rate
- ?J(?) ? gradient of cost function with respect to parameters
The algorithm repeatedly updates parameters until convergence, finding the optimal values that minimize the cost function.
Implementation Example
Here's a complete implementation of gradient descent for linear regression:
import numpy as np
import matplotlib.pyplot as plt
# Generate sample data
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)
# Add bias term (intercept)
X_b = np.c_[np.ones((100, 1)), X]
# Hyperparameters
learning_rate = 0.01
num_iterations = 1000
# Initialize parameters randomly
theta = np.random.randn(2, 1)
# Gradient descent algorithm
for iteration in range(num_iterations):
# Calculate predictions
predictions = X_b.dot(theta)
# Calculate gradients
gradients = 2/100 * X_b.T.dot(predictions - y)
# Update parameters
theta = theta - learning_rate * gradients
# Results
print(f"Intercept: {theta[0][0]:.4f}")
print(f"Slope: {theta[1][0]:.4f}")
# Visualize results
plt.figure(figsize=(8, 6))
plt.scatter(X, y, alpha=0.7, label='Data points')
plt.plot(X, X_b.dot(theta), color='red', linewidth=2, label='Regression line')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Linear Regression with Gradient Descent')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
Intercept: 4.1581 Slope: 2.8204
How It Works
The gradient descent process involves several key steps:
- Initialize parameters randomly
- Calculate predictions using current parameters
- Compute cost (mean squared error)
- Calculate gradients of the cost function
- Update parameters in the direction opposite to gradients
- Repeat until convergence
Key Parameters
Two critical hyperparameters affect gradient descent performance:
- Learning Rate (?) ? Controls step size. Too large causes overshooting; too small causes slow convergence
- Number of Iterations ? Determines how long the algorithm runs. More iterations allow better convergence
Conclusion
Gradient descent provides an efficient method to optimize linear regression models by iteratively minimizing the cost function. The algorithm automatically finds the best-fit line parameters, making it essential for machine learning applications. Understanding this fundamental optimization technique opens the door to more advanced machine learning algorithms.
---