How can a polynomial regression model be fit to understand non-linear trends in data in Python?

When dealing with real-world data, relationships between variables are often non-linear. While linear regression works well for straight-line relationships, we need polynomial regression to capture curved patterns in data. This technique fits polynomial equations to data points, allowing us to model complex relationships.

Polynomial regression extends linear regression by adding polynomial terms (x², x³, etc.) to capture non-linear trends. We'll use Anscombe's dataset to demonstrate this concept.

What is Polynomial Regression?

Polynomial regression fits a polynomial equation of degree n to the data:

y = ?? + ??x + ??x² + ??x³ + ... + ??x?

The order parameter in seaborn's lmplot() specifies the degree of the polynomial.

Example

Let's fit a polynomial regression model to visualize non-linear trends ?

import pandas as pd
import seaborn as sb
from matplotlib import pyplot as plt

# Load Anscombe's dataset
my_df = sb.load_dataset('anscombe')

# Create polynomial regression plot with order 3
sb.lmplot(x="x", y="y", data=my_df.query("dataset == 'II'"), order=3)
plt.title('Polynomial Regression (Order 3)')
plt.show()

Output

x y Polynomial Regression (Order 3)

Comparing Different Orders

Let's compare linear vs polynomial regression to see the difference ?

import pandas as pd
import seaborn as sb
from matplotlib import pyplot as plt

# Load Anscombe's dataset
my_df = sb.load_dataset('anscombe')
dataset_ii = my_df.query("dataset == 'II'")

# Create subplots for comparison
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Linear regression (order 1)
sb.scatterplot(x="x", y="y", data=dataset_ii, ax=axes[0])
sb.regplot(x="x", y="y", data=dataset_ii, order=1, ax=axes[0], scatter=False, color='red')
axes[0].set_title('Linear Regression (Order 1)')

# Polynomial regression (order 3)
sb.scatterplot(x="x", y="y", data=dataset_ii, ax=axes[1])
sb.regplot(x="x", y="y", data=dataset_ii, order=3, ax=axes[1], scatter=False, color='red')
axes[1].set_title('Polynomial Regression (Order 3)')

plt.tight_layout()
plt.show()

Using Scikit-learn for Polynomial Regression

For more control over polynomial regression, use scikit-learn's PolynomialFeatures ?

import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
import matplotlib.pyplot as plt

# Generate sample non-linear data
x = np.linspace(0, 10, 50).reshape(-1, 1)
y = 0.5 * x.ravel()**2 + np.random.normal(0, 2, 50)

# Create polynomial regression pipeline
poly_reg = Pipeline([
    ('poly', PolynomialFeatures(degree=2)),
    ('linear', LinearRegression())
])

# Fit the model
poly_reg.fit(x, y)

# Generate predictions
x_plot = np.linspace(0, 10, 100).reshape(-1, 1)
y_plot = poly_reg.predict(x_plot)

# Plot results
plt.scatter(x, y, alpha=0.6, label='Data points')
plt.plot(x_plot, y_plot, color='red', label='Polynomial fit (degree=2)')
plt.xlabel('x')
plt.ylabel('y')
plt.title('Polynomial Regression with Scikit-learn')
plt.legend()
plt.show()

print(f"R² score: {poly_reg.score(x, y):.3f}")
R² score: 0.943

Key Points

  • Order/Degree: Higher orders capture more complex curves but risk overfitting
  • Overfitting: Very high polynomial orders may fit noise rather than true patterns
  • Validation: Always validate polynomial models on unseen data
  • Feature Scaling: Consider scaling features when using high-degree polynomials

Conclusion

Polynomial regression extends linear regression to capture non-linear relationships by adding polynomial terms. Use seaborn's order parameter or scikit-learn's PolynomialFeatures for implementation. Choose the polynomial degree carefully to balance model complexity and generalization.

Updated on: 2026-03-25T13:26:20+05:30

244 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements