How can non-linear data be fit to a model in Python?

When building regression models, we need to handle non-linear data that doesn't follow straight-line relationships. Python's Seaborn library provides tools to visualize and fit non-linear data using regression plots.

We'll use Anscombe's quartet dataset to demonstrate fitting non-linear data. This famous dataset contains four groups with identical statistical properties but very different distributions, making it perfect for understanding non-linear relationships.

Loading and Exploring the Dataset

First, let's load the Anscombe dataset and examine its structure ?

import pandas as pd
import seaborn as sb
from matplotlib import pyplot as plt

# Load Anscombe's dataset
data = sb.load_dataset('anscombe')
print(data.head())
print("\nDataset groups:", data['dataset'].unique())
   dataset     x     y
0        I  10.0   8.04
1        I   8.0   6.95
2        I  13.0   7.58
3        I   9.0   8.81
4        I  11.0   8.33

Dataset groups: ['I' 'II' 'III' 'IV']

Fitting Linear Models to Different Datasets

Let's create regression plots for each group to see how linear models fit different data patterns ?

import pandas as pd
import seaborn as sb
from matplotlib import pyplot as plt

# Load the dataset
data = sb.load_dataset('anscombe')

# Create subplots for all four datasets
fig, axes = plt.subplots(2, 2, figsize=(10, 8))
datasets = ['I', 'II', 'III', 'IV']

for i, dataset in enumerate(datasets):
    row, col = i // 2, i % 2
    sb.regplot(x="x", y="y", data=data.query(f"dataset == '{dataset}'"), 
               ax=axes[row, col])
    axes[row, col].set_title(f'Dataset {dataset}')

plt.tight_layout()
plt.show()

Handling Non-Linear Relationships

For truly non-linear data, we can use polynomial regression or other curve-fitting techniques ?

import pandas as pd
import seaborn as sb
from matplotlib import pyplot as plt

# Load the dataset
data = sb.load_dataset('anscombe')

# Fit polynomial regression to the non-linear dataset III
plt.figure(figsize=(8, 6))
sb.regplot(x="x", y="y", data=data.query("dataset == 'III'"), 
           order=2, scatter_kws={'s': 50})
plt.title('Non-linear Data Fit (Dataset III) - Polynomial Regression')
plt.show()

Key Parameters for Non-Linear Fitting

Parameter Description Use Case
order Degree of polynomial Curved relationships
robust Reduces outlier influence Data with outliers
logistic Logistic regression Binary outcomes

Comparison of Fitting Methods

import pandas as pd
import seaborn as sb
from matplotlib import pyplot as plt

data = sb.load_dataset('anscombe')
dataset_iii = data.query("dataset == 'III'")

fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Linear fit
sb.regplot(x="x", y="y", data=dataset_iii, ax=axes[0])
axes[0].set_title('Linear Fit')

# Polynomial fit (order=2)
sb.regplot(x="x", y="y", data=dataset_iii, order=2, ax=axes[1])
axes[1].set_title('Polynomial Fit (order=2)')

# Robust fit
sb.regplot(x="x", y="y", data=dataset_iii, robust=True, ax=axes[2])
axes[2].set_title('Robust Linear Fit')

plt.tight_layout()
plt.show()

Conclusion

Use seaborn.regplot() with the order parameter for polynomial fitting of non-linear data. For data with outliers, the robust=True parameter provides better fits by reducing outlier influence.

---
Updated on: 2026-03-25T13:22:11+05:30

338 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements