How to appropriately plot the losses values acquired by (loss_curve_) from MLPClassifier? (Matplotlib)

The MLPClassifier from scikit-learn provides a loss_curve_ attribute that tracks training loss at each iteration. Plotting these values helps visualize training convergence across different hyperparameters and datasets.

Understanding MLPClassifier Loss Curves

The loss_curve_ attribute stores the loss function value after each iteration during training. By plotting these values, we can compare how different solvers and learning rates affect convergence behavior.

Complete Example

Here's how to plot loss curves for different MLPClassifier configurations across multiple datasets ?

import warnings
import matplotlib.pyplot as plt
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import MinMaxScaler
from sklearn import datasets
from sklearn.exceptions import ConvergenceWarning

plt.rcParams["figure.figsize"] = [12, 8]
plt.rcParams["figure.autolayout"] = True

# Define different hyperparameter configurations
params = [
    {'solver': 'sgd', 'learning_rate': 'constant', 'momentum': 0, 'learning_rate_init': 0.2},
    {'solver': 'sgd', 'learning_rate': 'constant', 'momentum': .9, 'nesterovs_momentum': False, 'learning_rate_init': 0.2},
    {'solver': 'sgd', 'learning_rate': 'constant', 'momentum': .9, 'nesterovs_momentum': True, 'learning_rate_init': 0.2},
    {'solver': 'sgd', 'learning_rate': 'invscaling', 'momentum': 0, 'learning_rate_init': 0.2},
    {'solver': 'adam', 'learning_rate_init': 0.01}
]

labels = [
    "constant learning-rate", 
    "constant with momentum", 
    "constant with Nesterov's momentum", 
    "inv-scaling learning-rate", 
    "adam"
]

plot_args = [
    {'c': 'red', 'linestyle': '-'},
    {'c': 'green', 'linestyle': '-'},
    {'c': 'blue', 'linestyle': '-'},
    {'c': 'orange', 'linestyle': '--'},
    {'c': 'black', 'linestyle': '-'}
]

def plot_on_dataset(X, y, ax, name):
    ax.set_title(f'Loss Curves - {name.title()} Dataset')
    ax.set_xlabel('Iterations')
    ax.set_ylabel('Loss')
    
    # Scale features for better convergence
    X = MinMaxScaler().fit_transform(X)
    
    # Adjust iterations based on dataset complexity
    max_iter = 15 if name == "digits" else 200
    
    mlps = []
    for label, param in zip(labels, params):
        mlp = MLPClassifier(random_state=0, max_iter=max_iter, **param)
        
        with warnings.catch_warnings():
            warnings.filterwarnings("ignore", category=ConvergenceWarning, module="sklearn")
            mlp.fit(X, y)
        mlps.append(mlp)
    
    # Plot loss curves
    for mlp, label, args in zip(mlps, labels, plot_args):
        ax.plot(mlp.loss_curve_, label=label, **args)
    
    ax.legend()
    ax.grid(True, alpha=0.3)

# Create subplots for different datasets
fig, axes = plt.subplots(2, 2, figsize=(12, 8))

# Load datasets
iris = datasets.load_iris()
X_digits, y_digits = datasets.load_digits(return_X_y=True)

data_sets = [
    (iris.data, iris.target),
    (X_digits, y_digits),
    datasets.make_circles(noise=0.2, factor=0.5, random_state=1),
    datasets.make_moons(noise=0.3, random_state=0)
]

dataset_names = ['iris', 'digits', 'circles', 'moons']

# Plot loss curves for each dataset
for ax, data, name in zip(axes.ravel(), data_sets, dataset_names):
    plot_on_dataset(*data, ax=ax, name=name)

plt.tight_layout()
plt.show()
[Displays a 2x2 grid of plots showing loss curves for different MLPClassifier configurations across iris, digits, circles, and moons datasets]

Key Components

Component Purpose Description
loss_curve_ Training monitoring List of loss values per iteration
MinMaxScaler Feature scaling Normalizes features for better convergence
max_iter Training control Maximum iterations before stopping
Different solvers Optimization SGD vs Adam optimization algorithms

Analyzing the Results

The loss curves reveal important training characteristics ?

  • Adam solver typically shows smooth, consistent convergence
  • SGD with momentum can converge faster but may be more unstable
  • Learning rate schedules like "invscaling" show gradual loss reduction
  • Dataset complexity affects convergence speed and final loss values

Conclusion

Plotting loss_curve_ from MLPClassifier helps visualize training progress and compare different hyperparameter configurations. Use this technique to select optimal solver and learning rate combinations for your specific dataset.

Updated on: 2026-03-25T23:55:20+05:30

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements