- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Regression Analysis and the Best Fitting Line using C++
Introduction
Regression Analysis is the most basic form of predictive analysis.
In Statistics, linear regression is the approach of modeling the relationship between a scalar value and one or more explanatory variables.
In Machine learning, Linear Regression is a supervised algorithm. Such an algorithm predicts a target value based on independent variables.
More About Linear Regression and Regression Analysis
In Linear Regression / Analysis the target is a real or continuous value like salary, BMI, etc. It is generally used to predict the relationship between a dependent and a bunch of independent variables. These models generally fit a linear equation, however, there are other types of regression as well including higher-order polynomials
Before fitting a linear model on the data, it is necessary to check if the data points have linear relationships between them. This is evident from their scatterplots. The goal of the algorithm/model is to find the best-fitting line.
In this article, we are going to explore Linear Regression Analysis and its implementation using C++.
The linear regression equation is in the form of Y = c + mx , where Y is the target variable and X is the independent or explanatory parameter/variable. m is the slope of the regression line and c is the intercept. Since this is a 2-dimensional regression task, the model tries to find the line of best fit during training. It is not necessary that all the points exactly line on the same line. Some of the data points may lie on the line and some scattered around it. The vertical distance between the line and the data point is the residual. This can be either negative or positive based on whether the point lies below or above the line. Residuals are the measure of how well the line fits the data. The algorithm continues to minimize the total residual error.
The residual for each observation is the difference between predicted values of y(dependent variable) and observed values of y
$$\mathrm{Residual\: =\: actual\: y\: value\:−\:predicted\: y\: value}$$
$$\mathrm{ri\:=\:yi\:−\:y'i}$$
The most common metric for evaluating linear regression model performance is called root mean squared error, or RMSE. The basic idea is to measure how bad/erroneous the model's predictions are when compared to actual observed values.
So, a high RMSE is “bad” and a low RMSE is “good”
RMSE error is given as
$$\mathrm{RMSE\:=\:\sqrt{\frac{\sum_i^n=1\:(yi\:-\:yi')^2}{n}}}$$
Implementation using C++
#include<iostream> #define N 50 using namespace std; int main(){ int n, i; float x[N], y[N], sum_x=0, sum_x2=0, sum_y=0, sum_xy=0, a, b; /* Input */ cout<<"Please enter the number of data points.."; cin>>n; cout<<"Enter data:"<< endl; for(i=1;i<=n;i++){ cout<<"x["<< i <<"] = "; cin>>x[i]; cout<<"y["<< i <<"] = "; cin>>y[i]; } /* Calculating Required Sum */ for(i=1;i<=n;i++){ sum_x = sum_x + x[i]; sum_x2 = sum_x2 + x[i]*x[i]; sum_y = sum_y + y[i]; sum_xy = sum_xy + x[i]*y[i]; } /* Calculating a and b */ b = (n*sum_xy-sum_x*sum_y)/(n*sum_x2-sum_x*sum_x); a = (sum_y - b*sum_x)/n; /* Displaying value of a and b */ cout<<"Calculated value of a is "<< a << "and b is "<< b << endl; cout<<"Equation of best fit line is: y = "<< a <<" + "<< b<<"x"; return(0); }
Output
Please enter the number of data points..5 Enter data: x[1] = 2 y[1] = 5 x[2] = 5 y[2] = 7 x[3] = 2 y[3] = 6 x[4] = 8 y[4] = 9 x[5] = 2 y[5] = 7 Calculated value of a is 4.97917 and b is 0.479167 Equation of best fit line is: y = 4.97917 + 0.479167x
Conclusion
Regression Analysis is a very simple yet powerful technique for predictive analysis both in Machine Learning and Statistics. The idea lies in its simplicity and underlying linear relationships between independent and target variables.
- Related Articles
- Regression Analysis and the Best Fitting Line using Python
- Appliation of Regression Analysis Psychology
- How to limit the length of regression line using ggplot2 in R?
- Sentence Screen Fitting in C++
- Fitting Shelves Problem in C++
- Explain Regression analysis method in estimation of working capital.
- How to plot the regression line starting from origin using ggplot2 in R?
- How to find the difference between regression line and the points in R?
- Which Evaluation Metrics is Best for Linear Regression
- Linear Regression using PyTorch?
- Linear Regression using Python?
- How to create regression model line in a scatterplot created by using ggplot2 in R?
- How to change the line color in a Seaborn linear regression jointplot?
- Finding the smallest fitting number in JavaScript
- Read file line by line using C++
