
- Scikit Learn Tutorial
- Scikit Learn - Home
- Scikit Learn - Introduction
- Scikit Learn - Modelling Process
- Scikit Learn - Data Representation
- Scikit Learn - Estimator API
- Scikit Learn - Conventions
- Scikit Learn - Linear Modeling
- Scikit Learn - Extended Linear Modeling
- Stochastic Gradient Descent
- Scikit Learn - Support Vector Machines
- Scikit Learn - Anomaly Detection
- Scikit Learn - K-Nearest Neighbors
- Scikit Learn - KNN Learning
- Classification with Naïve Bayes
- Scikit Learn - Decision Trees
- Randomized Decision Trees
- Scikit Learn - Boosting Methods
- Scikit Learn - Clustering Methods
- Clustering Performance Evaluation
- Dimensionality Reduction using PCA
- Scikit Learn Useful Resources
- Scikit Learn - Quick Guide
- Scikit Learn - Useful Resources
- Scikit Learn - Discussion
Scikit Learn - Linear Regression
It is one of the best statistical models that studies the relationship between a dependent variable (Y) with a given set of independent variables (X). The relationship can be established with the help of fitting a best line.
sklearn.linear_model.LinearRegression is the module used to implement linear regression.
Parameters
Following table consists the parameters used by Linear Regression module −
Sr.No | Parameter & Description |
---|---|
1 |
fit_intercept − Boolean, optional, default True Used to calculate the intercept for the model. No intercept will be used in the calculation if this set to false. |
2 |
normalize − Boolean, optional, default False If this parameter is set to True, the regressor X will be normalized before regression. The normalization will be done by subtracting the mean and dividing it by L2 norm. If fit_intercept = False, this parameter will be ignored. |
3 |
copy_X − Boolean, optional, default True By default, it is true which means X will be copied. But if it is set to false, X may be overwritten. |
4 |
n_jobs − int or None, optional(default = None) It represents the number of jobs to use for the computation. |
Attributes
Following table consists the attributes used by Linear Regression module −
Sr.No | Attributes & Description |
---|---|
1 |
coef_ − array, shape(n_features,) or (n_targets, n_features) It is used to estimate the coefficients for the linear regression problem. It would be a 2D array of shape (n_targets, n_features) if multiple targets are passed during fit. Ex. (y 2D). On the other hand, it would be a 1D array of length (n_features) if only one target is passed during fit. |
2 |
Intercept_ − array This is an independent term in this linear model. |
Implementation Example
First, import the required packages −
import numpy as np from sklearn.linear_model import LinearRegression
Now, provide the values for independent variable X −
X = np.array([[1,1],[1,2],[2,2],[2,3]])
Next, the value of dependent variable y can be calculated as follows −
y = np.dot(X, np.array([1,2])) + 3
Now, create a linear regression object as follows −
regr = LinearRegression( fit_intercept = True, normalize = True, copy_X = True, n_jobs = 2 ) .fit(X,y)
Use predict() method to predict using this linear model as follows −
regr.predict(np.array([[3,5]]))
Output
array([16.])
Example
To get the coefficient of determination of the prediction we can use Score() method as follows −
regr.score(X,y)
Output
1.0
Example
We can estimate the coefficients by using attribute named ‘coef’ as follows −
regr.coef_
Output
array([1., 2.])
Example
We can calculate the intercept i.e. the expected mean value of Y when all X = 0 by using attribute named ‘intercept’ as follows −
In [24]: regr.intercept_ Output 3.0000000000000018
Complete code of implementation example
import numpy as np from sklearn.linear_model import LinearRegression X = np.array([[1,1],[1,2],[2,2],[2,3]]) y = np.dot(X, np.array([1,2])) + 3 regr = LinearRegression( fit_intercept = True, normalize = True, copy_X = True, n_jobs = 2 ).fit(X,y) regr.predict(np.array([[3,5]])) regr.score(X,y) regr.coef_ regr.intercept_