Which Evaluation Metrics is Best for Linear Regression

Machine Learning Artificial Intelligence MLOps

Introduction

In machine learning, linear regression is one of the best algorithms used for linear types of data and it returns very accurate predictions the same. Although after training a model with any algorithm it is necessary to check the performance of the algorithm to get an idea about how the model is behaving and what things are needed to improve the model. In this article, we will discuss the various evaluation metrics and the best metric to evaluate the linear regression algorithm.

Why Find the Best Evaluation Metrics?

There are many evaluation metrics available for regression type of algorithm to check the behavior and performance of the algorithm on the fed data. It is very important to choose the best and appropriate type of evaluation metrics to easily understand the errors and mistakes that model is making.

To find the best suitable evaluation metrics for linear regression it is needed to understand the core intuition and working mechanism of the algorithm first to make the base of the discussion clear and justifying.

How Linear Regression Works?

The working mechanism of the linear regression algorithm is very easy to understand and interpret. The basics of linear regression are to plot the data points on the graph, here the dimensions of the graph will be equal to the features that the data is having. As the data will be linear it will be easy to find the best-fit line or the regression line to make predictions for other points.

Here a simple line equation y=mx+c is used to find the regression line. The errors and other predictions are made with the help of this line only.

Firstly the best appropriate value of m and c is calculated, once done it is very easy to just put the value of x into the equation and it will return the target variable value y.

Once the best-fit line or the regression line is obtained, the errors are calculated with the help of this line. Although different evaluation metrics use different approaches. Let us try to understand them.

Mean Absolute Error

Here in this evaluation metric, the value of the y variable is substituted for the value that the regression line predicts and the absolute obtained value is considered as the error of the model.

MAE = | Yi - Y^ |

MAE = Mean absolute error

Yi = Datapoint’s target actual value

Y^ = Datapoints target predicted value

Mean Squared Error

Mean squared error is also one of the most used evaluation metrics for regression problems. Here the value of the actual y variable is substituted for the predicted y value from the regression line and the square of the term is considered as the mean squared error of the algorithm.

MSE = (Yi - Y^)^2

MSE = Mean squared error

Yi = Datapoint’s target actual value

Y^ = Datapoints target predicted value

Root Mean Squared Error

Root mean squared error is simply a root of mean squared error which is mainly used to get an idea about the error of the algorithm on a small scale as the mean squared error can show a very large error as it squared the errors.

RMSE = sq.root ((Yi - Y^)^2

RMSE = Root mean squared error

Yi = Datapoint’s target actual value

Y^ = Datapoints target predicted value

R2 Score

R2 score is also a famous evaluation metric most of the time used for regression datasets. Same to the accuracy value, the R2 score returns a value that ranges between 0 and 100. Here 0 means the worst-performing model and 100 means that the model does not make any mistakes.

R2 Score = 1 - SSR/SSM

SSR = Sum of Squared Errors (Regression Line)

SSM = Sun of Squared errors (Mean)

Which is Best?

As we have discussed the linear regression algorithm and various evaluation metrics, then now is the best time to discuss the best evaluation metrics for the same. Well, we can not say that particular evaluation metrics are always best for any type of data in linear regression, it totally depends on the type of data and for what we are evaluating the model.

For example, if the data is having very extreme outliers then the outliers will have very high or low values than normal observation in the dataset. In this case, the error term for the outliers will also be very high, if you are using mean absolute error then the error term for outliers will be lower than if you are using mean squared error as the MSE squares the value of error giving more weightage to the errors.

So if you want to give more weightage to the outliers and make the model robust for it then you can use MSE which will have a very large value for outliers and the model can be tuned accordingly with respect to outliers.

Also, the combination of R2 score and root mean squared errors for any linear regression model can provide very valuable information and they both can be used together to know how the model is performing and making mistakes on the data. Note that sometimes the R2 score can be very high for the poor model, so always check the RMSE of the model simultaneously.

Key Takeaways

Using appropriate evaluation metrics for the model can help tune and enhance the model very effectively.
Mean absolute error can be used where there are no prominent outliers in the data.
Mean squared error can be used if you want to give more weightage to the outliers errors and tune the model accordingly.
The combination of the R2 score and RMSE is mostly the all-time best solution for the evaluation linear regression model.

Conclusion

In this article, we discussed Linear regression and various evaluation metrics that can be used for the evaluation of the same. We also discussed the best evaluation metrics that can be used for evaluating linear regression and the reason behind it. This will help one to understand the metrics better and use them according to the type of data and situation.

Parth Shukla

Updated on: 24-Feb-2023

296 Views

Kickstart Your Career

Get certified by completing the course

Get Started