Ridge and Lasso Regression Explained


Introduction

Two well-liked regularization methods for linear regression models are ridge and lasso regression. They help to solve the overfitting issue, which arises when a model is overly complicated and fits the training data too well, leading to worse performance on fresh data. Ridge regression reduces the size of the coefficients and prevents overfitting by introducing a penalty element to the cost function of linear regression. The squared coefficient total is directly proportional to this penalty component. Adversely, a penalty term is added in lasso regression that is proportionate to the total of the absolute values of the coefficients. This promotes some of the coefficients to approach 0 exactly, rendering some aspects of the model utterly irrelevant. We will examine these two approaches in further detail in this post, talk about how they vary, and look at how scikit-learn may be used to apply them in Python.

Ridge Regression

To combat the issue of overfitting in linear regression models, ridge regression is a regularization approach. The size of the coefficients is reduced and overfitting is prevented by adding a penalty term to the cost function of linear regression. The penalty term regulates the magnitude of the coefficients in the model and is proportional to the sum of squared coefficients. The coefficients shrink toward zero when the penalty term's value is raised, lowering the model's variance.

Ridge regression attempts to reduce the following cost function −

$$\mathrm{J(w) = (\frac{1}{2})\:*\:\sum(y\:-\:h(y))^2+\sum|w|^2}$$

where y is the actual value, h(y) denotes the predicted value, and w denotes the feature coefficient.

Ridge regression works best when there are several tiny to medium-sized coefficients and when all characteristics are significant. Also, it is computationally more effective than other regularization methods. Ridge regression's primary drawback is that it does not erase any characteristics, which may not always be a good thing. The specific situation at hand and the qualities of the data will determine whether to use Ridge or another regularization approach.

Program

import numpy as np
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error

n_samples, n_features = 100, 10
X = np.random.randn(n_samples, n_features)
w_true = np.random.randn(n_features)
y = X.dot(w_true) + 0.5*np.random.randn(n_samples)

train_size = int(n_samples * 0.8)
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]

alpha = 0.1
ridge = Ridge(alpha=alpha)
ridge.fit(X_train, y_train)

y_pred = ridge.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean squared error: {mse:.2f}")

Output

Mean squared error: 0.36

We separated the data in this example into training and testing sets using the train test split function from scikit-learn. After that, we scale the data with StandardScaler to make sure that each feature has a comparable range and distribution.

The regularization intensity is then adjusted using the alpha parameter after creating a Ridge regression model with the help of Scikit-Ridge learn's class. An increase in alpha results in stronger regularization.

use the fit approach to fit the model to the training data and the prediction method to provide predictions on the testing data. The last technique we employ to evaluate the model's effectiveness is the mean squared error, which computes the average squared difference between the predicted values and the actual values.

Noting that alternative regularization methods like Lasso or Elastic Net may be better suitable in some circumstances, Ridge regression may not always improve the performance of linear regression models. Moreover, cross-validation should be used to fine-tune the regularization strength alpha option to obtain the ideal value that strikes a compromise between model complexity and generalization performance.

Lasso Regression

Lasso regression, commonly referred to as L1 regularization, is a method for stopping overfitting in linear regression models by including a penalty term in the cost function. In contrast to Ridge regression, it adds the total of the absolute values of the coefficients rather than the sum of the squared coefficients.

Lasso regression attempts to reduce the following cost function −

$$\mathrm{J(w) = (\frac{1}{2})\:*\:\sum(y\:-\:h(y))^2+\sum|w|}$$

where y is the actual value, h(y) denotes the predicted value, and w denotes the feature coefficient.

Lasso regression can reduce certain coefficients to zero, conducting feature selection in effect. With high-dimensional datasets where many characteristics could be unnecessary or redundant, this is very helpful. The resultant model is less complex and easier to understand, and by minimizing overfitting, it frequently exhibits improved predictive performance.

Program

import numpy as np
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error

# Generate some random data
n_samples, n_features = 100, 10
X = np.random.randn(n_samples, n_features)
w_true = np.random.randn(n_features)
y = X.dot(w_true) + 0.5*np.random.randn(n_samples)

# Split the data into training and testing sets
train_size = int(n_samples * 0.8)
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]

# Set the regularization strength
alpha = 0.1

# Create the Lasso regression object and fit the model
lasso = Lasso(alpha=alpha)
lasso.fit(X_train, y_train)

# Make predictions on the testing set
y_pred = lasso.predict(X_test)

# Calculate the mean squared error
mse = mean_squared_error(y_test, y_pred)

# Print the mean squared error
print(f"Mean squared error: {mse:.2f}")

Output

Mean squared error: 0.43

In this code, we first produce some random data (100 samples and 10 characteristics). We then divided the data into 80/20 training and testing sets. Then, we set the regularization strength to 0.1 and build a Lasso regression object instance. We then used the fit() function to fit the model to the training data. We use the predict() method to make predictions on the testing data, and the mean squared error between the predicted and actual values is calculated using scikit-mean squared error() learn's function. Finally, the mean squared error is printed.

It is worth noting that the Lasso regression model performs feature selection by setting some of the coefficients to zero. This means that it might be effective in instances when there are numerous features and we want to find the most essential ones for predicting the target variable. But, if we consider that all of the qualities are relevant for prediction, it may not be the best option. Ridge regression may be a superior option in such instances.

Difference between Ridge and Lasso Regression

Ridge Regression

Lasso Regression

Shrinks the coefficients toward zero

and Encourages some coefficients to be exactly zero

Adds a penalty term proportional to the sum of squared coefficients

Adds a penalty term proportional to the sum of absolute values of coefficients

Does not eliminate any features

Can eliminate some features

Suitable when all features are importantly

Suitable when some features are irrelevant or redundant

More computationally efficient

Less computationally efficient

Requires setting a hyperparameter

Requires setting a hyperparameter

Performs better when there are many small to medium-sized coefficients

Performs better when there are a few large coefficients

Conclusion

Ridge and Lasso's regression are a powerful technique for regularizing linear regression models and preventing overfitting. They both add a penalty term to the cost function, but with different approaches. Ridge regression shrinks the coefficients towards zero, while Lasso regression encourages some of them to be exactly zero. These techniques can be implemented easily in Python using scikit-learn, making it accessible to a wide audience. By understanding and implementing Ridge and Lasso regression, you can improve the performance of your linear regression models and make more accurate predictions on new data.

Updated on: 13-Apr-2023

10K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements