- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Multiple Regression
Introduction
Before we learn about multiple linear regression, let us understand what linear regression is.
Linear regression helps in determining the relationship between two variables in data sets. As already stated linear regression has its limitation to two variables.
Therefore, multiple linear regression helps in determining the relationship between more than two variables.
Though multiple linear regression cannot overcome the weakness of linear regression, multiple linear regression is used to make a regression model with multiple independent variables and single dependent variable.
Multiple linear regression is used most importantly in econometrics and financial inference.
Definition
Simple linear regression is a tool that enables us to make predictions about a variable in a data set with the provided information of the other variable.
Multiple linear regression is a statistical tool that uses multiple independent variables to predict the result of a dependent variable.
Formula
Only one independent and dependent variable are involved in linear regression; while, in multiple linear regression multiple independent variables are used to better understand the dependent variable.
$$\mathrm{\underline{y}=b_0+b_1 \underline{p_1 }-b_2 \underline{p_2 }.......b_n \underline{p_n }+ϵ}$$
where, for n= number of observations −
y denotes a dependent variable
pn denotes the explanatory variables
b0= y-intercept, which is constant
bp= coefficients of slope for pn
ϵ= error term for the model.
Stepwise Multiple Regression
Stepwise regression is a process where a regression model is developed using a variable to predict and add and delete variables one at a time.
Stepwise multiple regression can also be described as a method which determines a regression equation that starts with one independent variable and adds independent variables one after the other.
Also known as the forward selection method, in the stepwise multiple regression method we start without an independent variable and add one independent variable iteratively to the regression each time.
There is also a method opposite to the forward selection method known as backwards elimination method, which uses multiple variables and at each iteration eliminates one independent variable.
Residual − Residual variation are the variations in the value of dependent variables explained by the regression model. Also known as random error which occurs due to different sampling methods used.
Advantages of Stepwise Multiple Regression
In the regression equation, independent variables with regression coefficients (other than zero) are used.
The coefficient of determination of the regression equation and the changes in the estimate of multiple standard errors are determined.
A regression equation with a considerable number of regression coefficients can be derived efficiently with the use of the stepwise multiple regression.
Multivariate Multiple Regression
Usually, bivariate level is used to find the statistical inference. For multivariate multiple regression tests have also been developed which determine the relation among multiple variables.
Multiple regression analysis is the extension of correlation analysis which is used for multivariate inferences.
Multicollinearity
If the inter-correlation of predictor variables is high, the tem used to explain the condition is Multicollinearity.
Signs of Multicollinearity
If for a pair of predictor variables, there is a high correlation.
If there is no physical sense for the magnitude or signs of regression coefficients.
If there are less non-significant regression coefficients for a number of predictor variables.
The addition or deletion of a predictor variable is determined by the significance of magnitude or sign of regression coefficients.
Solved Examples
1.For a dataset with two predictor variables p_1 and p_2 and one response variable q, apply multiple linear regression to create a regression model.
q | p1 | p2 | |
---|---|---|---|
140 | 60 | 22 | |
155 | 62 | 25 | |
159 | 67 | 24 | |
179 | 70 | 20 | |
192 | 71 | 15 | |
200 | 72 | 14 | |
212 | 75 | 14 | |
215 | 78 | 11 | |
Mean | 181.5 | 69.375 | 18.125 |
Sum | 1452 | 555 | 145 |
First we calculate the regression sum calculations as follows −
$$\mathrm{ \sum p_1^2=\sum p_1^2-\sum p_1/n=38767 – (555)^2 / 8 = 263.875}$$
$$\mathrm{ \sum p_2^2=\sum p_2^2-\sum p_2/n= 2823 – (145)^2 / 8 = 194.875 }$$
$$\mathrm{ \sum p_1 q==\sum p_1 q-\sum p_1 \sum q/n== 101895 – (555×1452) / 8 = 1162.5 }$$
$$\mathrm{ \sum p_2 q=\sum p_2 q-\sum p_2 \sum q/n== 25364 – (145×1452) / 8 = -953.5}$$
$$\mathrm{\sum p_1 p_2=\sum p_1 p_2-(\sum p_1 p_2)/n== 9859 – (555×145) / 8 = -200.375}$$
p12 | p22 | p1 q | p2 q | p1 p2 |
---|---|---|---|---|
3600 | 484 | 8400 | 3080 | 1320 |
3844 | 625 | 9610 | 3875 | 1550 |
4489 | 576 | 10653 | 3816 | 1608 |
4900 | 400 | 12530 | 3580 | 1400 |
5041 | 225 | 13632 | 2880 | 1065 |
5184 | 196 | 14400 | 2800 | 1008 |
5625 | 196 | 15900 | 2968 | 1050 |
6084 | 121 | 16770 | 2365 | 858 |
To calculate b_1 we use following formula −$\mathrm{[(\sum p_2^2)(\sum p_1 q)-(\sum p_1 p_2)(\sum p_2 q)]/[(\sum p_1^2)(\sum p_2^2)-(\sum p_1 p_2)^2]}$
Thus,$\mathrm{ b_1= [(194.875)(1162.5) – (-200.375)(-953.5)] / [(263.875) (194.875) – (-200.375)^2] }$
$$\mathrm{ b_1=3.148}$$
The formula to calculate b2 is − $\mathrm{[(\sum p_1^2)(\sum p_2 q)-(\sum p_1 p_2)(\sum p_1 q)]/[(\sum p_1^2)(\sum p_2^2)-(\sum p_1 p_2)^2]}$
Thus,$\mathrm{b_2 = [(263.875)(-953.5) – (-200.375)(1152.5)] / [(263.875) (194.875) – (-200.375)^2] }$
$$\mathrm{ b_2=-1.656}$$
The formula to calculate b0 is −$\mathrm{\underline{y}-b_1 \underline{p_1}-b_2 \underline{p_2}}$
Thus, $\mathrm{b_0= 181.5 – 3.148(69.375) – (-1.656)(18.125) = -6.867}$
Putting the values of b0, b1, and b2 in the equation.
The linear regression equation would be given by −
$$\mathrm{\hat{y} = b_0 + b_1×p_1 + b_2×p_2}$$
$$\mathrm{\hat{y}=-6.867 + 3.148\: p_1 – 1.656\: p_2}$$
Now let's analyse the result for the above linear equation
b0 = -6.867. The mean value for y is -6.867, when both p_1and p2 are equal to zero.
b1= 3.148. Assuming p2 is constant, a unit increase in p_1 corresponds to 3.148 units of increase in q.
b2=-1.656. Assuming p1 is constant, a unit increase in p2 corresponds to 1.656 units of decrease in q.
Conclusion
Multiple regression is an alternative solution to linear regression models that allow predictions of systems with multiple independent variables.
Linear regression is a useful tool that employs linear relationships between single dependent and independent variables.
Multiple linear regression is used most importantly in econometrics and financial inference.
Stepwise regression is a process where a regression model is developed using a variable to predict and add and delete variables one at a time.
Multiple regression analysis is the extension of correlation analysis which is used for multivariate inferences.
FAQs
1. What are the uses of linear regression?
With the help of linear regression, one can predict the price of oil, interest rates, and price movement of oil for a certain time which can affect the price of stocks.
2. What are the limits of linear regression?
linear regression has its limitation to two variables, one dependent and one independent variable.
3. What are the uses of multiple linear regression?
Multiple linear regression is used most importantly in econometrics and financial inference.
4. What is a stepwise multiple regression process?
Stepwise regression is a process where a regression model is developed using a variable to predict and add and delete variables one at a time.
5. What is the backwards elimination method?
Backwards elimination method uses multiple variables and eliminates one independent variable at each of the iterations.
To Continue Learning Please Login
Login with Google