Scikit Learn - Linear Regression

It is one of the best statistical models that studies the relationship between a dependent variable (Y) with a given set of independent variables (X). The relationship can be established with the help of fitting a best line.

sklearn.linear_model.LinearRegression is the module used to implement linear regression.

Parameters

Following table consists the parameters used by Linear Regression module −

Sr.No	Parameter & Description
1	fit_intercept − Boolean, optional, default True Used to calculate the intercept for the model. No intercept will be used in the calculation if this set to false.
2	normalize − Boolean, optional, default False If this parameter is set to True, the regressor X will be normalized before regression. The normalization will be done by subtracting the mean and dividing it by L2 norm. If fit_intercept = False, this parameter will be ignored.
3	copy_X − Boolean, optional, default True By default, it is true which means X will be copied. But if it is set to false, X may be overwritten.
4	n_jobs − int or None, optional(default = None) It represents the number of jobs to use for the computation.

Attributes

Following table consists the attributes used by Linear Regression module −

Sr.No	Attributes & Description
1	coef_ − array, shape(n_features,) or (n_targets, n_features) It is used to estimate the coefficients for the linear regression problem. It would be a 2D array of shape (n_targets, n_features) if multiple targets are passed during fit. Ex. (y 2D). On the other hand, it would be a 1D array of length (n_features) if only one target is passed during fit.
2	Intercept_ − array This is an independent term in this linear model.

Sr.No

Attributes & Description

coef_ − array, shape(n_features,) or (n_targets, n_features)

It is used to estimate the coefficients for the linear regression problem. It would be a 2D array of shape (n_targets, n_features) if multiple targets are passed during fit. Ex. (y 2D). On the other hand, it would be a 1D array of length (n_features) if only one target is passed during fit.

Intercept_ − array

This is an independent term in this linear model.

Implementation Example

First, import the required packages −

import numpy as np
from sklearn.linear_model import LinearRegression

Now, provide the values for independent variable X −

X = np.array([[1,1],[1,2],[2,2],[2,3]])

Next, the value of dependent variable y can be calculated as follows −

y = np.dot(X, np.array([1,2])) + 3

Now, create a linear regression object as follows −

regr = LinearRegression(
   fit_intercept = True, normalize = True, copy_X = True, n_jobs = 2
)
.fit(X,y)

Use predict() method to predict using this linear model as follows −

regr.predict(np.array([[3,5]]))

Output

array([16.])

Example

To get the coefficient of determination of the prediction we can use Score() method as follows −

regr.score(X,y)

Output

1.0

Example

We can estimate the coefficients by using attribute named ‘coef’ as follows −

regr.coef_

Output

array([1., 2.])

Example

We can calculate the intercept i.e. the expected mean value of Y when all X = 0 by using attribute named ‘intercept’ as follows −

In [24]: regr.intercept_
Output
3.0000000000000018

Complete code of implementation example

import numpy as np
from sklearn.linear_model import LinearRegression
X = np.array([[1,1],[1,2],[2,2],[2,3]])
y = np.dot(X, np.array([1,2])) + 3
regr = LinearRegression(
   fit_intercept = True, normalize = True, copy_X = True, n_jobs = 2
).fit(X,y)
regr.predict(np.array([[3,5]]))
regr.score(X,y)
regr.coef_
regr.intercept_