Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Graph k-NN decision boundaries in Matplotlib
k-NN (k-Nearest Neighbors) decision boundaries show how a k-NN classifier divides the feature space into regions for different classes. We can visualize these boundaries using matplotlib with contour plots and scatter plots.
Understanding k-NN Decision Boundaries
A decision boundary is the surface that separates different classes in the feature space. For k-NN, the boundary is determined by the majority vote of the k nearest neighbors for each point in the space.
Complete Example
Here's a complete example that creates k-NN decision boundaries using the Iris dataset ?
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn import neighbors, datasets
# Set up the plot parameters
plt.rcParams["figure.figsize"] = [10, 6]
plt.rcParams["figure.autolayout"] = True
# Parameters
n_neighbors = 15
h = 0.02 # step size in the mesh
# Load the iris dataset
iris = datasets.load_iris()
X = iris.data[:, :2] # Use only first two features
y = iris.target
# Define color maps
cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF'])
cmap_bold = ['red', 'green', 'blue']
# Create and train the k-NN classifier
clf = neighbors.KNeighborsClassifier(n_neighbors, weights='uniform')
clf.fit(X, y)
# Create a mesh grid
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
# Predict on the mesh grid
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
# Create the plot
plt.figure()
# Plot decision boundaries
plt.contourf(xx, yy, Z, cmap=cmap_light, alpha=0.8)
# Plot the data points
scatter = plt.scatter(X[:, 0], X[:, 1], c=y, cmap=ListedColormap(cmap_bold),
edgecolors='black', s=50)
# Set plot properties
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.title(f"k-NN Decision Boundaries (k = {n_neighbors})")
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1])
# Add legend
plt.legend(scatter.legend_elements()[0], iris.target_names,
title="Classes", loc="upper right")
plt.show()
Key Components Explained
Mesh Grid Creation
The mesh grid creates a fine grid of points across the entire feature space ?
import numpy as np
# Example of mesh grid creation
x_min, x_max = 0, 5
y_min, y_max = 0, 3
h = 0.5
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
print("X coordinates shape:", xx.shape)
print("Y coordinates shape:", yy.shape)
print("Sample X coordinates:")
print(xx)
X coordinates shape: (6, 10) Y coordinates shape: (6, 10) Sample X coordinates: [[0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5] [0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5] [0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5] [0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5] [0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5] [0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5]]
Effect of k Value
Different k values create different decision boundaries ?
from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.colors import ListedColormap
# Load iris data
iris = datasets.load_iris()
X = iris.data[:, :2]
y = iris.target
# Different k values
k_values = [1, 5, 15]
h = 0.02
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF'])
for idx, k in enumerate(k_values):
# Train classifier
clf = KNeighborsClassifier(n_neighbors=k)
clf.fit(X, y)
# Create mesh
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
# Predict
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
# Plot
axes[idx].contourf(xx, yy, Z, cmap=cmap_light, alpha=0.8)
axes[idx].scatter(X[:, 0], X[:, 1], c=y, edgecolors='black')
axes[idx].set_title(f'k = {k}')
axes[idx].set_xlabel('Sepal Length')
axes[idx].set_ylabel('Sepal Width')
plt.tight_layout()
plt.show()
Comparison of k Values
| k Value | Decision Boundary | Characteristics |
|---|---|---|
| k = 1 | Very complex, jagged | High variance, overfitting |
| k = 5-15 | Smooth, balanced | Good bias-variance tradeoff |
| k = large | Very smooth, simple | High bias, underfitting |
Conclusion
k-NN decision boundaries visualize how the classifier partitions the feature space based on the nearest neighbors. Use contour plots to show decision regions and experiment with different k values to find the optimal balance between overfitting and underfitting.
---