Machine Learning - Polynomial Regression

Polynomial Linear Regression is a type of regression analysis in which the relationship between the independent variable and the dependent variable is modeled as an n-th degree polynomial function. Polynomial regression allows for a more complex relationship between the variables to be captured, beyond the linear relationship in Simple and Multiple Linear Regression.

Python Implementation

Here's an example implementation of Polynomial Linear Regression using the Boston Housing dataset from Scikit-Learn −

Example

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error, r2_score
import numpy as np
import matplotlib.pyplot as plt

# Load the Boston Housing dataset
boston = load_boston()

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(boston.data,
boston.target, test_size=0.2, random_state=0)

# Create a polynomial features object with degree 2
poly = PolynomialFeatures(degree=2)

# Transform the input data to include polynomial features
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)

# Create a linear regression object
lr_model = LinearRegression()

# Fit the model on the training data
lr_model.fit(X_train_poly, y_train)

# Make predictions on the test data
y_pred = lr_model.predict(X_test_poly)

# Calculate the mean squared error
mse = mean_squared_error(y_test, y_pred)

# Calculate the coefficient of determination
r2 = r2_score(y_test, y_pred)

print('Mean Squared Error:', mse)
print('Coefficient of Determination:', r2)

# Sort the test data by the target variable
sort_idx = X_test[:, 12].argsort()

X_test_sorted = X_test[sort_idx]
y_test_sorted = y_test[sort_idx]

# Plot the predicted values against the actual values
plt.figure(figsize=(7.5, 3.5))
plt.scatter(y_test_sorted, y_pred[sort_idx])
plt.xlabel('Actual Values')
plt.ylabel('Predicted Values')

# Add a regression line to the plot
x = np.linspace(0, 50, 100)
y = x
plt.plot(x, y, color='red')

# Show the plot
plt.show()

Output

When you execute the program, it will produce the following plot as the output and it will print the Mean Squared Error and the Coefficient of Determination on the terminal −

Mean Squared Error: 25.215797617051855
Coefficient of Determination: 0.6903318065831567