Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How can non-linear data be fit to a model in Python?
When building regression models, we need to handle non-linear data that doesn't follow straight-line relationships. Python's Seaborn library provides tools to visualize and fit non-linear data using regression plots.
We'll use Anscombe's quartet dataset to demonstrate fitting non-linear data. This famous dataset contains four groups with identical statistical properties but very different distributions, making it perfect for understanding non-linear relationships.
Loading and Exploring the Dataset
First, let's load the Anscombe dataset and examine its structure ?
import pandas as pd
import seaborn as sb
from matplotlib import pyplot as plt
# Load Anscombe's dataset
data = sb.load_dataset('anscombe')
print(data.head())
print("\nDataset groups:", data['dataset'].unique())
dataset x y 0 I 10.0 8.04 1 I 8.0 6.95 2 I 13.0 7.58 3 I 9.0 8.81 4 I 11.0 8.33 Dataset groups: ['I' 'II' 'III' 'IV']
Fitting Linear Models to Different Datasets
Let's create regression plots for each group to see how linear models fit different data patterns ?
import pandas as pd
import seaborn as sb
from matplotlib import pyplot as plt
# Load the dataset
data = sb.load_dataset('anscombe')
# Create subplots for all four datasets
fig, axes = plt.subplots(2, 2, figsize=(10, 8))
datasets = ['I', 'II', 'III', 'IV']
for i, dataset in enumerate(datasets):
row, col = i // 2, i % 2
sb.regplot(x="x", y="y", data=data.query(f"dataset == '{dataset}'"),
ax=axes[row, col])
axes[row, col].set_title(f'Dataset {dataset}')
plt.tight_layout()
plt.show()
Handling Non-Linear Relationships
For truly non-linear data, we can use polynomial regression or other curve-fitting techniques ?
import pandas as pd
import seaborn as sb
from matplotlib import pyplot as plt
# Load the dataset
data = sb.load_dataset('anscombe')
# Fit polynomial regression to the non-linear dataset III
plt.figure(figsize=(8, 6))
sb.regplot(x="x", y="y", data=data.query("dataset == 'III'"),
order=2, scatter_kws={'s': 50})
plt.title('Non-linear Data Fit (Dataset III) - Polynomial Regression')
plt.show()
Key Parameters for Non-Linear Fitting
| Parameter | Description | Use Case |
|---|---|---|
order |
Degree of polynomial | Curved relationships |
robust |
Reduces outlier influence | Data with outliers |
logistic |
Logistic regression | Binary outcomes |
Comparison of Fitting Methods
import pandas as pd
import seaborn as sb
from matplotlib import pyplot as plt
data = sb.load_dataset('anscombe')
dataset_iii = data.query("dataset == 'III'")
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
# Linear fit
sb.regplot(x="x", y="y", data=dataset_iii, ax=axes[0])
axes[0].set_title('Linear Fit')
# Polynomial fit (order=2)
sb.regplot(x="x", y="y", data=dataset_iii, order=2, ax=axes[1])
axes[1].set_title('Polynomial Fit (order=2)')
# Robust fit
sb.regplot(x="x", y="y", data=dataset_iii, robust=True, ax=axes[2])
axes[2].set_title('Robust Linear Fit')
plt.tight_layout()
plt.show()
Conclusion
Use seaborn.regplot() with the order parameter for polynomial fitting of non-linear data. For data with outliers, the robust=True parameter provides better fits by reducing outlier influence.
