Scikit Learn - Extended Linear Modeling



This chapter focusses on the polynomial features and pipelining tools in Sklearn.

Introduction to Polynomial Features

Linear models trained on non-linear functions of data generally maintains the fast performance of linear methods. It also allows them to fit a much wider range of data. That’s the reason in machine learning such linear models, that are trained on nonlinear functions, are used.

One such example is that a simple linear regression can be extended by constructing polynomial features from the coefficients.

Mathematically, suppose we have standard linear regression model then for 2-D data it would look like this −

$$Y=W_{0}+W_{1}X_{1}+W_{2}X_{2}$$

Now, we can combine the features in second-order polynomials and our model will look like as follows −

$$Y=W_{0}+W_{1}X_{1}+W_{2}X_{2}+W_{3}X_{1}X_{2}+W_{4}X_1^2+W_{5}X_2^2$$

The above is still a linear model. Here, we saw that the resulting polynomial regression is in the same class of linear models and can be solved similarly.

To do so, scikit-learn provides a module named PolynomialFeatures. This module transforms an input data matrix into a new data matrix of given degree.

Parameters

Followings table consist the parameters used by PolynomialFeatures module

Sr.No Parameter & Description
1

degree − integer, default = 2

It represents the degree of the polynomial features.

2

interaction_only − Boolean, default = false

By default, it is false but if set as true, the features that are products of most degree distinct input features, are produced. Such features are called interaction features.

3

include_bias − Boolean, default = true

It includes a bias column i.e. the feature in which all polynomials powers are zero.

4

order − str in {‘C’, ‘F’}, default = ‘C’

This parameter represents the order of output array in the dense case. ‘F’ order means faster to compute but on the other hand, it may slow down subsequent estimators.

Attributes

Followings table consist the attributes used by PolynomialFeatures module

Sr.No Attributes & Description
1

powers_ − array, shape (n_output_features, n_input_features)

It shows powers_ [i,j] is the exponent of the jth input in the ith output.

2

n_input_features _ − int

As name suggests, it gives the total number of input features.

3

n_output_features _ − int

As name suggests, it gives the total number of polynomial output features.

Implementation Example

Following Python script uses PolynomialFeatures transformer to transform array of 8 into shape (4,2) −

from sklearn.preprocessing import PolynomialFeatures
import numpy as np
Y = np.arange(8).reshape(4, 2)
poly = PolynomialFeatures(degree=2)
poly.fit_transform(Y)

Output

array(
   [
      [ 1., 0., 1., 0., 0., 1.],
      [ 1., 2., 3., 4., 6., 9.],
      [ 1., 4., 5., 16., 20., 25.],
      [ 1., 6., 7., 36., 42., 49.]
   ]
)

Streamlining using Pipeline tools

The above sort of preprocessing i.e. transforming an input data matrix into a new data matrix of a given degree, can be streamlined with the Pipeline tools, which are basically used to chain multiple estimators into one.

Example

The below python scripts using Scikit-learn’s Pipeline tools to streamline the preprocessing (will fit to an order-3 polynomial data).

#First, import the necessary packages.
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
import numpy as np

#Next, create an object of Pipeline tool
Stream_model = Pipeline([('poly', PolynomialFeatures(degree=3)), ('linear', LinearRegression(fit_intercept=False))])

#Provide the size of array and order of polynomial data to fit the model.
x = np.arange(5)
y = 3 - 2 * x + x ** 2 - x ** 3
Stream_model = model.fit(x[:, np.newaxis], y)

#Calculate the input polynomial coefficients.
Stream_model.named_steps['linear'].coef_

Output

array([ 3., -2., 1., -1.])

The above output shows that the linear model trained on polynomial features is able to recover the exact input polynomial coefficients.

Advertisements