Article Categories

Selected Reading

How can non-linear data be fit to a model in Python?

Python Server Side Programming Programming

When building regression models, we need to handle non-linear data that doesn't follow straight-line relationships. Python's Seaborn library provides tools to visualize and fit non-linear data using regression plots.

We'll use Anscombe's quartet dataset to demonstrate fitting non-linear data. This famous dataset contains four groups with identical statistical properties but very different distributions, making it perfect for understanding non-linear relationships.

Loading and Exploring the Dataset

First, let's load the Anscombe dataset and examine its structure ?

import pandas as pd
import seaborn as sb
from matplotlib import pyplot as plt

# Load Anscombe's dataset
data = sb.load_dataset('anscombe')
print(data.head())
print("\nDataset groups:", data['dataset'].unique())

   dataset     x     y
0        I  10.0   8.04
1        I   8.0   6.95
2        I  13.0   7.58
3        I   9.0   8.81
4        I  11.0   8.33

Dataset groups: ['I' 'II' 'III' 'IV']

Fitting Linear Models to Different Datasets

Let's create regression plots for each group to see how linear models fit different data patterns ?

import pandas as pd
import seaborn as sb
from matplotlib import pyplot as plt

# Load the dataset
data = sb.load_dataset('anscombe')

# Create subplots for all four datasets
fig, axes = plt.subplots(2, 2, figsize=(10, 8))
datasets = ['I', 'II', 'III', 'IV']

for i, dataset in enumerate(datasets):
    row, col = i // 2, i % 2
    sb.regplot(x="x", y="y", data=data.query(f"dataset == '{dataset}'"), 
               ax=axes[row, col])
    axes[row, col].set_title(f'Dataset {dataset}')

plt.tight_layout()
plt.show()

Handling Non-Linear Relationships

For truly non-linear data, we can use polynomial regression or other curve-fitting techniques ?

import pandas as pd
import seaborn as sb
from matplotlib import pyplot as plt

# Load the dataset
data = sb.load_dataset('anscombe')

# Fit polynomial regression to the non-linear dataset III
plt.figure(figsize=(8, 6))
sb.regplot(x="x", y="y", data=data.query("dataset == 'III'"), 
           order=2, scatter_kws={'s': 50})
plt.title('Non-linear Data Fit (Dataset III) - Polynomial Regression')
plt.show()

Key Parameters for Non-Linear Fitting

Parameter	Description	Use Case
`order`	Degree of polynomial	Curved relationships
`robust`	Reduces outlier influence	Data with outliers
`logistic`	Logistic regression	Binary outcomes

Comparison of Fitting Methods

import pandas as pd
import seaborn as sb
from matplotlib import pyplot as plt

data = sb.load_dataset('anscombe')
dataset_iii = data.query("dataset == 'III'")

fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Linear fit
sb.regplot(x="x", y="y", data=dataset_iii, ax=axes[0])
axes[0].set_title('Linear Fit')

# Polynomial fit (order=2)
sb.regplot(x="x", y="y", data=dataset_iii, order=2, ax=axes[1])
axes[1].set_title('Polynomial Fit (order=2)')

# Robust fit
sb.regplot(x="x", y="y", data=dataset_iii, robust=True, ax=axes[2])
axes[2].set_title('Robust Linear Fit')

plt.tight_layout()
plt.show()

Conclusion

Use seaborn.regplot() with the order parameter for polynomial fitting of non-linear data. For data with outliers, the robust=True parameter provides better fits by reducing outlier influence.

---

AmitDiwan

Updated on: 2026-03-25T13:22:11+05:30

415 Views

Previous Next