Article Categories

Selected Reading

How to implement a gradient descent in Python to find a local minimum?

Machine Learning Python Data Science

Gradient descent is a prominent optimization approach in machine learning for minimizing a model's loss function. In simple terms, it involves repeatedly adjusting the model's parameters until the optimal values are found that minimize the loss function. The algorithm works by taking small steps in the direction of the negative gradient of the loss function the path of steepest descent.

The learning rate is a hyperparameter that controls the algorithm's trade-off between speed and accuracy by determining the step size. Many machine learning algorithms like linear regression, logistic regression, and neural networks use gradient descent for training models to minimize the difference between predicted and actual values.

Implementation Steps

Here's how we'll implement gradient descent in Python

Import necessary libraries
Define the function and its derivative
Implement the gradient descent algorithm
Set parameters to find the local minimum
Visualize the results with a plot

Basic Implementation

Importing Libraries

import numpy as np
import matplotlib.pyplot as plt

Define Function and Derivative

We'll use a simple quadratic function f(x) = x² - 4x + 6 and its derivative

def f(x):
    return x**2 - 4*x + 6

def df(x):
    return 2*x - 4

# Test the function
x_test = 3
print(f"f({x_test}) = {f(x_test)}")
print(f"df({x_test}) = {df(x_test)}")

f(3) = 3
df(3) = 2

The function f(x) represents what we want to minimize, and df(x) is its derivative. The derivative tells us the slope at any point, guiding the algorithm toward the minimum.

Gradient Descent Algorithm

def gradient_descent(initial_x, learning_rate, num_iterations):
    x = initial_x
    x_history = [x]
    
    for i in range(num_iterations):
        gradient = df(x)
        x = x - learning_rate * gradient
        x_history.append(x)
        
        # Print progress every 10 iterations
        if i % 10 == 0:
            print(f"Iteration {i}: x = {x:.4f}, f(x) = {f(x):.4f}")
    
    return x, x_history

Finding the Local Minimum

Let's apply gradient descent to find the minimum of our function

# Set parameters
initial_x = 0
learning_rate = 0.1
num_iterations = 20

# Run gradient descent
x_min, x_history = gradient_descent(initial_x, learning_rate, num_iterations)

print(f"\nLocal minimum found at x = {x_min:.4f}")
print(f"Function value at minimum: f({x_min:.4f}) = {f(x_min):.4f}")
print(f"Theoretical minimum at x = 2, f(2) = {f(2)}")

Iteration 0: x = 0.4000, f(x) = 4.5600
Iteration 10: x = 1.9329, f(x) = 2.0045

Local minimum found at x = 1.9998
Function value at minimum: f(1.9998) = 2.0000
Theoretical minimum at x = 2, f(2) = 2

Visualization

Let's visualize how gradient descent converges to the minimum

# Create x values for plotting the function
x_vals = np.linspace(-1, 5, 100)
y_vals = f(x_vals)

# Create the plot
plt.figure(figsize=(10, 6))

# Plot the function
plt.plot(x_vals, y_vals, 'b-', linewidth=2, label='f(x) = x² - 4x + 6')

# Plot the gradient descent path
x_history_array = np.array(x_history)
y_history = f(x_history_array)
plt.plot(x_history, y_history, 'ro-', markersize=4, label='Gradient Descent Path')

# Mark the minimum
plt.plot(x_min, f(x_min), 'gs', markersize=10, label=f'Minimum at ({x_min:.2f}, {f(x_min):.2f})')

# Formatting
plt.xlabel('x')
plt.ylabel('f(x)')
plt.title('Gradient Descent Optimization')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

print(f"Converged in {len(x_history)-1} iterations")

Converged in 20 iterations

Different Learning Rates Comparison

The learning rate significantly affects convergence speed and stability

# Compare different learning rates
learning_rates = [0.01, 0.1, 0.3]
colors = ['red', 'blue', 'green']

plt.figure(figsize=(12, 4))

for i, lr in enumerate(learning_rates):
    x_min, x_hist = gradient_descent(0, lr, 20)
    
    plt.subplot(1, 3, i+1)
    plt.plot(range(len(x_hist)), x_hist, 'o-', color=colors[i])
    plt.axhline(y=2, color='black', linestyle='--', alpha=0.5, label='True minimum')
    plt.title(f'Learning Rate = {lr}')
    plt.xlabel('Iteration')
    plt.ylabel('x value')
    plt.legend()
    plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Iteration 0: x = 0.0200, f(x) = 5.9596
Iteration 10: x = 1.6474, f(x) = 2.1241
Iteration 0: x = 0.4000, f(x) = 4.5600
Iteration 10: x = 1.9329, f(x) = 2.0045
Iteration 0: x = 1.2000, f(x) = 2.6400
Iteration 10: x = 1.9999, f(x) = 2.0000

Key Parameters

Parameter	Effect	Recommendation
Learning Rate	Controls step size	0.01 - 0.3 (start small)
Iterations	Number of steps	Until convergence
Initial Point	Starting position	Any reasonable value

Conclusion

Gradient descent is an effective optimization algorithm that finds local minima by iteratively moving in the direction of steepest descent. The learning rate controls convergence speed too small is slow, too large may overshoot. This implementation demonstrates the core concepts applicable to more complex machine learning optimization problems.

---

Jay Singh

Updated on: 2026-03-27T05:50:07+05:30

4K+ Views

Previous Next