Multiple Linear Regression in Machine Learning

Introduction

One of the key regression techniques, multiple linear regression simulates the linear relationship between one continuous dependent variable and a number of independent variables.

Two categories of linear regression algorithms exist −

Simple − only addresses two features.

Multiple − Deals with more than two features at once.

Let's examine multiple linear regression in detail in this article.

Multiple Linear Regression

Multiple linear regression is a style of predictive analysis that is frequently used. You can comprehend the relationship between such a continuous dependent variable and two or more independent variables using this kind of analysis.

The independent variables may be categorical or continuous, such as age and height (like gender and occupation). It's crucial to remember that before performing the analysis, if given dependent variable is categorical, one should pseudo code it.

Formula and Calculation

Multiple regression analysis allows for the simultaneous control of several factors that affect the dependent variable. The link between independent variables and dependent variables can be examined using regression analysis.

Let k stand for the quantity of variables denoted by the letters x1, x2, x3… xk.

To use this strategy, we must suppose that we have k independent variables that we may set. These variables will then probabilistically decide the result Y.

Additionally, we presume that Y is directly dependent on the variables as

Y = β0 + β1x1 + β2x2 + · · · + βkxk + ε

It depends on or is projected that the variable yi
The y-intercept determines the slope of y, therefore when xi and x2 are both zero, y will be 0.
The one-unit changes in xi1 and xi2 that cause changes in y are represented by the regression coefficients 1 and 2.
The slope coefficient of all independent variables is denoted by the symbol p.
The random error (residual) in the model is described by the phrase.
Except for the requirement that k not equal 1, this is identical to simple linear regression where is a standard error.

We have more than k observations, with n often being substantially higher.

We measure a value yi for the random variable Yi and assign the independent variables to the values xi1, xi2..., xik, for the ith observation.

As a result, the equations can be used to describe the model.

Yi = β0 + β1xi1 + β2xi2 + · · · + βkxik + i for i = 1, 2, . . . , n

where the mistakes i are separate standard variables with the same unknown variance of 2 and a mean of 0.

Difference Between Linear and Multiple Regression

Multiple linear regression is preferable than basic linear regression when predicting the result of a complex process.

The relationship between two variables in straightforward relationships can be precisely captured by a straightforward linear regression. However, multiple linear regression can identify more intricate interactions that demand deeper analysis.

Multiple independent variables are used in a multiple regression model. It can match curved and non-linear connections since it is not constrained by the same issues as the simple regression equation. The uses of multiple linear regression are as follows.

Control and planning.
Forecasting or prediction

It can be fascinating and helpful to estimate relationships between variables. The multiple regression model evaluates relationships between variables in terms of their capacity to forecast the value of the dependent variable, just like all other regression models do.

Example

import numpy as nm
import matplotlib.pyplot as pylt
import pandas as ps
dataset = ps.read_csv('https://raw.githubusercontent.com/mkgurucharan/Regression/master/Startups_Data.csv')
X1 = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ctlo = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [3])],
remainder='passthrough')
X1 = nm.array(ctlo.fit_transform(X1))
print(X1)
from sklearn.model_selection import train_test_split
X1_train, X1_test, y_train, y_test = train_test_split(X1, y, test_size = 0.2)
from sklearn.linear_model import LinearRegression
regressor_one = LinearRegression()
regressor_one.fit(X1_train, y_train)
y_pred = regressor.predict(X1_test)
df = ps.DataFrame({'Real Values':y_test, 'Predicted Values':y_pred})
df

Output

Predicted Values has a value of 74963.60

MLR Graph Looks Like

Usage of MLR

When we want to forecast a dependent variable using more than one independent variable, we utilize multiple regression. Ordinary linear squares (OLS) regression is the same kind of regression that it is. By contrasting the distributions of these variables based on changes in the values of the explanatory factors, OLS regression, on the other hand, identifies the impact of an explained variable on a continuous dependent variable.

MLR allows for simultaneous use of multiple explanatory variables. As a result, you can more accurately estimate what would happen to your data if specific modifications were implemented.

Make sure the data satisfies the following five requirements to ensure that it is suitable for the linear regression analysis −

A straight line connecting the dependent and independent variables
There is not much correlation between the independent variables.
The residuals' variance is always the same.
impartiality of observation (that is, each observation should have been collected independently).
Several-variate homogeneity (that is, all variables should be normally distributed).

Conclusion

To simulate more complex associations involving two or more independent variables and first one dependent variable, multiple linear regression is a statistical approach. When there are two or more x variables, it is utilized.

Sohail Tabrez

Updated on: 14-Jul-2023

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started