Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to implement a gradient descent in Python to find a local minimum?
Gradient descent is a prominent optimization approach in machine learning for minimizing a model's loss function. In simple terms, it involves repeatedly adjusting the model's parameters until the optimal values are found that minimize the loss function. The algorithm works by taking small steps in the direction of the negative gradient of the loss function the path of steepest descent.
The learning rate is a hyperparameter that controls the algorithm's trade-off between speed and accuracy by determining the step size. Many machine learning algorithms like linear regression, logistic regression, and neural networks use gradient descent for training models to minimize the difference between predicted and actual values.
Implementation Steps
Here's how we'll implement gradient descent in Python
Import necessary libraries
Define the function and its derivative
Implement the gradient descent algorithm
Set parameters to find the local minimum
Visualize the results with a plot
Basic Implementation
Importing Libraries
import numpy as np import matplotlib.pyplot as plt
Define Function and Derivative
We'll use a simple quadratic function f(x) = x² - 4x + 6 and its derivative
def f(x):
return x**2 - 4*x + 6
def df(x):
return 2*x - 4
# Test the function
x_test = 3
print(f"f({x_test}) = {f(x_test)}")
print(f"df({x_test}) = {df(x_test)}")
f(3) = 3 df(3) = 2
The function f(x) represents what we want to minimize, and df(x) is its derivative. The derivative tells us the slope at any point, guiding the algorithm toward the minimum.
Gradient Descent Algorithm
def gradient_descent(initial_x, learning_rate, num_iterations):
x = initial_x
x_history = [x]
for i in range(num_iterations):
gradient = df(x)
x = x - learning_rate * gradient
x_history.append(x)
# Print progress every 10 iterations
if i % 10 == 0:
print(f"Iteration {i}: x = {x:.4f}, f(x) = {f(x):.4f}")
return x, x_history
Finding the Local Minimum
Let's apply gradient descent to find the minimum of our function
# Set parameters
initial_x = 0
learning_rate = 0.1
num_iterations = 20
# Run gradient descent
x_min, x_history = gradient_descent(initial_x, learning_rate, num_iterations)
print(f"\nLocal minimum found at x = {x_min:.4f}")
print(f"Function value at minimum: f({x_min:.4f}) = {f(x_min):.4f}")
print(f"Theoretical minimum at x = 2, f(2) = {f(2)}")
Iteration 0: x = 0.4000, f(x) = 4.5600 Iteration 10: x = 1.9329, f(x) = 2.0045 Local minimum found at x = 1.9998 Function value at minimum: f(1.9998) = 2.0000 Theoretical minimum at x = 2, f(2) = 2
Visualization
Let's visualize how gradient descent converges to the minimum
# Create x values for plotting the function
x_vals = np.linspace(-1, 5, 100)
y_vals = f(x_vals)
# Create the plot
plt.figure(figsize=(10, 6))
# Plot the function
plt.plot(x_vals, y_vals, 'b-', linewidth=2, label='f(x) = x² - 4x + 6')
# Plot the gradient descent path
x_history_array = np.array(x_history)
y_history = f(x_history_array)
plt.plot(x_history, y_history, 'ro-', markersize=4, label='Gradient Descent Path')
# Mark the minimum
plt.plot(x_min, f(x_min), 'gs', markersize=10, label=f'Minimum at ({x_min:.2f}, {f(x_min):.2f})')
# Formatting
plt.xlabel('x')
plt.ylabel('f(x)')
plt.title('Gradient Descent Optimization')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
print(f"Converged in {len(x_history)-1} iterations")
Converged in 20 iterations
Different Learning Rates Comparison
The learning rate significantly affects convergence speed and stability
# Compare different learning rates
learning_rates = [0.01, 0.1, 0.3]
colors = ['red', 'blue', 'green']
plt.figure(figsize=(12, 4))
for i, lr in enumerate(learning_rates):
x_min, x_hist = gradient_descent(0, lr, 20)
plt.subplot(1, 3, i+1)
plt.plot(range(len(x_hist)), x_hist, 'o-', color=colors[i])
plt.axhline(y=2, color='black', linestyle='--', alpha=0.5, label='True minimum')
plt.title(f'Learning Rate = {lr}')
plt.xlabel('Iteration')
plt.ylabel('x value')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Iteration 0: x = 0.0200, f(x) = 5.9596 Iteration 10: x = 1.6474, f(x) = 2.1241 Iteration 0: x = 0.4000, f(x) = 4.5600 Iteration 10: x = 1.9329, f(x) = 2.0045 Iteration 0: x = 1.2000, f(x) = 2.6400 Iteration 10: x = 1.9999, f(x) = 2.0000
Key Parameters
| Parameter | Effect | Recommendation |
|---|---|---|
| Learning Rate | Controls step size | 0.01 - 0.3 (start small) |
| Iterations | Number of steps | Until convergence |
| Initial Point | Starting position | Any reasonable value |
Conclusion
Gradient descent is an effective optimization algorithm that finds local minima by iteratively moving in the direction of steepest descent. The learning rate controls convergence speed too small is slow, too large may overshoot. This implementation demonstrates the core concepts applicable to more complex machine learning optimization problems.
---