Multiple Regression


Introduction

  • Before we learn about multiple linear regression, let us understand what linear regression is.

  • Linear regression helps in determining the relationship between two variables in data sets. As already stated linear regression has its limitation to two variables.

  • Therefore, multiple linear regression helps in determining the relationship between more than two variables.

  • Though multiple linear regression cannot overcome the weakness of linear regression, multiple linear regression is used to make a regression model with multiple independent variables and single dependent variable.

  • Multiple linear regression is used most importantly in econometrics and financial inference.

Definition

  • Simple linear regression is a tool that enables us to make predictions about a variable in a data set with the provided information of the other variable.

  • Multiple linear regression is a statistical tool that uses multiple independent variables to predict the result of a dependent variable.

Formula

Only one independent and dependent variable are involved in linear regression; while, in multiple linear regression multiple independent variables are used to better understand the dependent variable.

$$\mathrm{\underline{y}=b_0+b_1 \underline{p_1 }-b_2 \underline{p_2 }.......b_n \underline{p_n }+ϵ}$$

where, for n= number of observations −

y denotes a dependent variable

pn denotes the explanatory variables

b0= y-intercept, which is constant

bp= coefficients of slope for pn

ϵ= error term for the model.

Stepwise Multiple Regression

  • Stepwise regression is a process where a regression model is developed using a variable to predict and add and delete variables one at a time.

  • Stepwise multiple regression can also be described as a method which determines a regression equation that starts with one independent variable and adds independent variables one after the other.

  • Also known as the forward selection method, in the stepwise multiple regression method we start without an independent variable and add one independent variable iteratively to the regression each time.

  • There is also a method opposite to the forward selection method known as backwards elimination method, which uses multiple variables and at each iteration eliminates one independent variable.

Residual − Residual variation are the variations in the value of dependent variables explained by the regression model. Also known as random error which occurs due to different sampling methods used.

Advantages of Stepwise Multiple Regression

  • In the regression equation, independent variables with regression coefficients (other than zero) are used.

  • The coefficient of determination of the regression equation and the changes in the estimate of multiple standard errors are determined.

  • A regression equation with a considerable number of regression coefficients can be derived efficiently with the use of the stepwise multiple regression.

Multivariate Multiple Regression

  • Usually, bivariate level is used to find the statistical inference. For multivariate multiple regression tests have also been developed which determine the relation among multiple variables.

  • Multiple regression analysis is the extension of correlation analysis which is used for multivariate inferences.

Multicollinearity

If the inter-correlation of predictor variables is high, the tem used to explain the condition is Multicollinearity.

Signs of Multicollinearity

  • If for a pair of predictor variables, there is a high correlation.

  • If there is no physical sense for the magnitude or signs of regression coefficients.

  • If there are less non-significant regression coefficients for a number of predictor variables.

  • The addition or deletion of a predictor variable is determined by the significance of magnitude or sign of regression coefficients.

Solved Examples

1.For a dataset with two predictor variables p_1 and p_2 and one response variable q, apply multiple linear regression to create a regression model.

q p1 p2
140 60 22
155 62 25
159 67 24
179 70 20
192 71 15
200 72 14
212 75 14
215 78 11
Mean 181.5 69.375 18.125
Sum 1452 555 145

First we calculate the regression sum calculations as follows −

$$\mathrm{ \sum p_1^2=\sum p_1^2-\sum p_1/n=38767 – (555)^2 / 8 = 263.875}$$

$$\mathrm{ \sum p_2^2=\sum p_2^2-\sum p_2/n= 2823 – (145)^2 / 8 = 194.875 }$$

$$\mathrm{ \sum p_1 q==\sum p_1 q-\sum p_1 \sum q/n== 101895 – (555×1452) / 8 = 1162.5 }$$

$$\mathrm{ \sum p_2 q=\sum p_2 q-\sum p_2 \sum q/n== 25364 – (145×1452) / 8 = -953.5}$$

$$\mathrm{\sum p_1 p_2=\sum p_1 p_2-(\sum p_1 p_2)/n== 9859 – (555×145) / 8 = -200.375}$$

p12 p22 p1 q p2 q p1 p2
3600 484 8400 3080 1320
3844 625 9610 3875 1550
4489 576 10653 3816 1608
4900 400 12530 3580 1400
5041 225 13632 2880 1065
5184 196 14400 2800 1008
5625 196 15900 2968 1050
6084 121 16770 2365 858

To calculate b_1 we use following formula −$\mathrm{[(\sum p_2^2)(\sum p_1 q)-(\sum p_1 p_2)(\sum p_2 q)]/[(\sum p_1^2)(\sum p_2^2)-(\sum p_1 p_2)^2]}$

Thus,$\mathrm{ b_1= [(194.875)(1162.5) – (-200.375)(-953.5)] / [(263.875) (194.875) – (-200.375)^2] }$

$$\mathrm{ b_1=3.148}$$

The formula to calculate b2 is − $\mathrm{[(\sum p_1^2)(\sum p_2 q)-(\sum p_1 p_2)(\sum p_1 q)]/[(\sum p_1^2)(\sum p_2^2)-(\sum p_1 p_2)^2]}$

Thus,$\mathrm{b_2 = [(263.875)(-953.5) – (-200.375)(1152.5)] / [(263.875) (194.875) – (-200.375)^2] }$

$$\mathrm{ b_2=-1.656}$$

The formula to calculate b0 is −$\mathrm{\underline{y}-b_1 \underline{p_1}-b_2 \underline{p_2}}$

Thus, $\mathrm{b_0= 181.5 – 3.148(69.375) – (-1.656)(18.125) = -6.867}$

Putting the values of b0, b1, and b2 in the equation.

The linear regression equation would be given by −

$$\mathrm{\hat{y} = b_0 + b_1×p_1 + b_2×p_2}$$

$$\mathrm{\hat{y}=-6.867 + 3.148\: p_1 – 1.656\: p_2}$$

Now let's analyse the result for the above linear equation

b0 = -6.867. The mean value for y is -6.867, when both p_1and p2 are equal to zero.

b1= 3.148. Assuming p2 is constant, a unit increase in p_1 corresponds to 3.148 units of increase in q.

b2=-1.656. Assuming p1 is constant, a unit increase in p2 corresponds to 1.656 units of decrease in q.

Conclusion

  • Multiple regression is an alternative solution to linear regression models that allow predictions of systems with multiple independent variables.

  • Linear regression is a useful tool that employs linear relationships between single dependent and independent variables.

  • Multiple linear regression is used most importantly in econometrics and financial inference.

  • Stepwise regression is a process where a regression model is developed using a variable to predict and add and delete variables one at a time.

  • Multiple regression analysis is the extension of correlation analysis which is used for multivariate inferences.

FAQs

1. What are the uses of linear regression?

With the help of linear regression, one can predict the price of oil, interest rates, and price movement of oil for a certain time which can affect the price of stocks.

2. What are the limits of linear regression?

linear regression has its limitation to two variables, one dependent and one independent variable.

3. What are the uses of multiple linear regression?

Multiple linear regression is used most importantly in econometrics and financial inference.

4. What is a stepwise multiple regression process?

Stepwise regression is a process where a regression model is developed using a variable to predict and add and delete variables one at a time.

5. What is the backwards elimination method?

Backwards elimination method uses multiple variables and eliminates one independent variable at each of the iterations.

Updated on: 04-Mar-2024

24 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements