How can an input data array be transformed to a new data array using the process of streamlining using scikit-learn pipelining tools?


Scikit−learn, commonly known as sklearn is a library in Python that is used for the purpose of implementing machine learning algorithms. It is an open−source library hence it can be used free of cost.

It is powerful and robust, since it provides a wide variety of tools to perform statistical modelling. This includes classification, regression, clustering, dimensionality reduction, and much more with the help of a powerful, and stable interface in Python.

This library is built on Numpy, SciPy and Matplotlib libraries.

It can be installed using the ‘pip’ command as shown below −

pip install scikit−learn

This library focuses on data modelling.

The streamlining operation can be implemented using the ‘Pipeline’ function, that can convert an array of specific dimensions to an array of different dimensions.

Following is an example −

Example

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
import numpy as np
print("Creating object of the tool pipeline")
Stream_model = Pipeline([('poly', PolynomialFeatures(degree=3)), ('linear', LinearRegression(fit_intercept=False))])
x = np.arange(6)
print("The size of the original ndarray is")
print(x.shape)
y = 4 − 2 * x + x ** 2 - x ** 3.5
Stream_model = Stream_model.fit(x[:, np.newaxis], y)
print("Input polynomial coefficients are")
print(Stream_model.named_steps['linear'].coef_)

Output

Creating object of the tool pipeline
The size of the original ndarray is
(6,)
Input polynomial coefficients are
[ 4.31339202 −7.82933051 7.96372751 −3.39570215]

Explanation

  • The required packages are imported, and they are given alias names for ease of use.

  • The ‘Pipeline’ function is used to create a pipeline of the entire process.

  • The values for data points ‘x’ and ‘y’ are generated using NumPy library.

  • The ‘LinearRegression’ function is called.

  • The details of the data generated is displayed on the console.

  • The model created using the ‘Pipeline’ function is fit to the data.

  • The Linear coefficients of the data are displayed on the console.

Updated on: 18-Jan-2021

103 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements