- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Ridge and Lasso Regression Explained
Introduction
Two well-liked regularization methods for linear regression models are ridge and lasso regression. They help to solve the overfitting issue, which arises when a model is overly complicated and fits the training data too well, leading to worse performance on fresh data. Ridge regression reduces the size of the coefficients and prevents overfitting by introducing a penalty element to the cost function of linear regression. The squared coefficient total is directly proportional to this penalty component. Adversely, a penalty term is added in lasso regression that is proportionate to the total of the absolute values of the coefficients. This promotes some of the coefficients to approach 0 exactly, rendering some aspects of the model utterly irrelevant. We will examine these two approaches in further detail in this post, talk about how they vary, and look at how scikit-learn may be used to apply them in Python.
Ridge Regression
To combat the issue of overfitting in linear regression models, ridge regression is a regularization approach. The size of the coefficients is reduced and overfitting is prevented by adding a penalty term to the cost function of linear regression. The penalty term regulates the magnitude of the coefficients in the model and is proportional to the sum of squared coefficients. The coefficients shrink toward zero when the penalty term's value is raised, lowering the model's variance.
Ridge regression attempts to reduce the following cost function −
$$\mathrm{J(w) = (\frac{1}{2})\:*\:\sum(y\:-\:h(y))^2+\sum|w|^2}$$
where y is the actual value, h(y) denotes the predicted value, and w denotes the feature coefficient.
Ridge regression works best when there are several tiny to medium-sized coefficients and when all characteristics are significant. Also, it is computationally more effective than other regularization methods. Ridge regression's primary drawback is that it does not erase any characteristics, which may not always be a good thing. The specific situation at hand and the qualities of the data will determine whether to use Ridge or another regularization approach.
Program
import numpy as np from sklearn.linear_model import Ridge from sklearn.metrics import mean_squared_error n_samples, n_features = 100, 10 X = np.random.randn(n_samples, n_features) w_true = np.random.randn(n_features) y = X.dot(w_true) + 0.5*np.random.randn(n_samples) train_size = int(n_samples * 0.8) X_train, X_test = X[:train_size], X[train_size:] y_train, y_test = y[:train_size], y[train_size:] alpha = 0.1 ridge = Ridge(alpha=alpha) ridge.fit(X_train, y_train) y_pred = ridge.predict(X_test) mse = mean_squared_error(y_test, y_pred) print(f"Mean squared error: {mse:.2f}")
Output
Mean squared error: 0.36
We separated the data in this example into training and testing sets using the train test split function from scikit-learn. After that, we scale the data with StandardScaler to make sure that each feature has a comparable range and distribution.
The regularization intensity is then adjusted using the alpha parameter after creating a Ridge regression model with the help of Scikit-Ridge learn's class. An increase in alpha results in stronger regularization.
use the fit approach to fit the model to the training data and the prediction method to provide predictions on the testing data. The last technique we employ to evaluate the model's effectiveness is the mean squared error, which computes the average squared difference between the predicted values and the actual values.
Noting that alternative regularization methods like Lasso or Elastic Net may be better suitable in some circumstances, Ridge regression may not always improve the performance of linear regression models. Moreover, cross-validation should be used to fine-tune the regularization strength alpha option to obtain the ideal value that strikes a compromise between model complexity and generalization performance.
Lasso Regression
Lasso regression, commonly referred to as L1 regularization, is a method for stopping overfitting in linear regression models by including a penalty term in the cost function. In contrast to Ridge regression, it adds the total of the absolute values of the coefficients rather than the sum of the squared coefficients.
Lasso regression attempts to reduce the following cost function −
$$\mathrm{J(w) = (\frac{1}{2})\:*\:\sum(y\:-\:h(y))^2+\sum|w|}$$
where y is the actual value, h(y) denotes the predicted value, and w denotes the feature coefficient.
Lasso regression can reduce certain coefficients to zero, conducting feature selection in effect. With high-dimensional datasets where many characteristics could be unnecessary or redundant, this is very helpful. The resultant model is less complex and easier to understand, and by minimizing overfitting, it frequently exhibits improved predictive performance.
Program
import numpy as np from sklearn.linear_model import Lasso from sklearn.metrics import mean_squared_error # Generate some random data n_samples, n_features = 100, 10 X = np.random.randn(n_samples, n_features) w_true = np.random.randn(n_features) y = X.dot(w_true) + 0.5*np.random.randn(n_samples) # Split the data into training and testing sets train_size = int(n_samples * 0.8) X_train, X_test = X[:train_size], X[train_size:] y_train, y_test = y[:train_size], y[train_size:] # Set the regularization strength alpha = 0.1 # Create the Lasso regression object and fit the model lasso = Lasso(alpha=alpha) lasso.fit(X_train, y_train) # Make predictions on the testing set y_pred = lasso.predict(X_test) # Calculate the mean squared error mse = mean_squared_error(y_test, y_pred) # Print the mean squared error print(f"Mean squared error: {mse:.2f}")
Output
Mean squared error: 0.43
In this code, we first produce some random data (100 samples and 10 characteristics). We then divided the data into 80/20 training and testing sets. Then, we set the regularization strength to 0.1 and build a Lasso regression object instance. We then used the fit() function to fit the model to the training data. We use the predict() method to make predictions on the testing data, and the mean squared error between the predicted and actual values is calculated using scikit-mean squared error() learn's function. Finally, the mean squared error is printed.
It is worth noting that the Lasso regression model performs feature selection by setting some of the coefficients to zero. This means that it might be effective in instances when there are numerous features and we want to find the most essential ones for predicting the target variable. But, if we consider that all of the qualities are relevant for prediction, it may not be the best option. Ridge regression may be a superior option in such instances.
Difference between Ridge and Lasso Regression
Ridge Regression |
Lasso Regression |
---|---|
Shrinks the coefficients toward zero |
and Encourages some coefficients to be exactly zero |
Adds a penalty term proportional to the sum of squared coefficients |
Adds a penalty term proportional to the sum of absolute values of coefficients |
Does not eliminate any features |
Can eliminate some features |
Suitable when all features are importantly |
Suitable when some features are irrelevant or redundant |
More computationally efficient |
Less computationally efficient |
Requires setting a hyperparameter |
Requires setting a hyperparameter |
Performs better when there are many small to medium-sized coefficients |
Performs better when there are a few large coefficients |
Conclusion
Ridge and Lasso's regression are a powerful technique for regularizing linear regression models and preventing overfitting. They both add a penalty term to the cost function, but with different approaches. Ridge regression shrinks the coefficients towards zero, while Lasso regression encourages some of them to be exactly zero. These techniques can be implemented easily in Python using scikit-learn, making it accessible to a wide audience. By understanding and implementing Ridge and Lasso regression, you can improve the performance of your linear regression models and make more accurate predictions on new data.