Types of Regression Techniques in Machine Learning

Introduction

Regression is the technique of predictive modeling to analyze the relationship between the independent variable and the dependent variable. The relation between the target(dependent variable) and the independent variable may be either linear or non-linear. The target is always continuous value and regression is widely used in forecasting, understanding cause and effect as well as in predictive analysis.

In this article let us explore various regression techniques available.

Regression Techniques

Linear Regression − It is the simplest of all regression techniques. In Linear Regression the independent and the target variable are linearly related or dependent on each other. If there are more than two independent variables involved, the method is known as Multiple Linear Regression.

The general equation of Linear Regression is given as.

y = mx + c + E

where, y is the output; m is the slope and E is the error between the predicted and actual values. The model tries to find the best-fit line with slope m and y-intercept c so that the error is minimum. However Linear Regression model is affected by outliers.

A more Generalized Equation of Linear and Multiple Regression models can be written as

y = β₀ + β₁ x₁+ β₂x₂+ ...+ β_n x_n

Ridge Regression − When there is a high amount of correlation between independent variables and the data has multicollinearity leading to bias in the least squares, the Ridge Regression is used. It is a regularization technique that adds a penalty to the cost equation to reduce the model complexity and reduce errors. It is also known as the L2 Regression technique. The equation for Ridge Regression is

H_(Ridge) = X(X′X + λI)−1X
Lasso Regression − Lasso Regression is also known as L1 Regularization. Lasso regression can do both regularizations as well as feature selection. It adds a penalty term to the cost that can zero out the effect of less effective features and thus helping the model in selecting only the required features and also helping to reduce overfitting. Also out of many collinear features, only one was selected and others diminished to zero.

Lasso Regression is represented as

M^{-1}Σ^{M}_{j=1}f(X_{i}, Y_{L }, α, β)
Logistic Regression − Logistic Regression is used only when the target variable is binary such as 0 or 1, True or False. Thus is it a kind of classification technique. The relationship between the target and independent variable is determined by the Logit function used here.

The equation for logistic regression is

$$\mathrm{\log_{}{\log{\frac{p}{(1-p)}}}=\beta_0+\beta_1 X_1+\beta_2X_2+\dotso+\beta_nX_n}$$

For Logistic Regression to work properly the dataset should be large enough with no multicollinearity between independent and target variables.

Bayesian Linear Regression − This Regression Technique is based on the Bayes Theorem. Here instead of using least squares as in the case of Linear regression to estimate the coefficients, it uses a posterior distribution to determine the weights and parameters of the model. It is faster and uses fewer resources.
Support Vector Regression − This Regression technique is based on Support Vector Machines. Here both linear and non-linear variables can be used. It uses the Kernel trick using both linear and non-linear kernels to transform the data so that patterns in it can be revealed.
Decision Tree and Random Forest Regressor − This Decision Tree Regression technique helps to predict target values based on decisions made at each node and the leaf node contains a label. It is a nonparametric method.

Random Forest Regressor − is an ensemble of decision trees using the Bagging method of Aggregation that reduces the variance of the aggregated model.

Conclusion

Regression is the simplest and easiest Machine Learning technique. There are numerous types of regression available. The use of the type of regression depends upon the distribution and the type of data and the complexity of the problem at hand. Regression algorithms are generally used to predict a real-valued value from a set of features where the relation between the target and independent variables can either be linear or non-linear.

Mithilesh Pradhan

Updated on: 09-Aug-2023

169 Views

Kickstart Your Career

Get certified by completing the course

Get Started