Article Categories

Selected Reading

Graph k-NN decision boundaries in Matplotlib

Matplotlib Python Data Visualization

k-NN (k-Nearest Neighbors) decision boundaries show how a k-NN classifier divides the feature space into regions for different classes. We can visualize these boundaries using matplotlib with contour plots and scatter plots.

Understanding k-NN Decision Boundaries

A decision boundary is the surface that separates different classes in the feature space. For k-NN, the boundary is determined by the majority vote of the k nearest neighbors for each point in the space.

Complete Example

Here's a complete example that creates k-NN decision boundaries using the Iris dataset ?

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn import neighbors, datasets

# Set up the plot parameters
plt.rcParams["figure.figsize"] = [10, 6]
plt.rcParams["figure.autolayout"] = True

# Parameters
n_neighbors = 15
h = 0.02  # step size in the mesh

# Load the iris dataset
iris = datasets.load_iris()
X = iris.data[:, :2]  # Use only first two features
y = iris.target

# Define color maps
cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF'])
cmap_bold = ['red', 'green', 'blue']

# Create and train the k-NN classifier
clf = neighbors.KNeighborsClassifier(n_neighbors, weights='uniform')
clf.fit(X, y)

# Create a mesh grid
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                     np.arange(y_min, y_max, h))

# Predict on the mesh grid
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Create the plot
plt.figure()

# Plot decision boundaries
plt.contourf(xx, yy, Z, cmap=cmap_light, alpha=0.8)

# Plot the data points
scatter = plt.scatter(X[:, 0], X[:, 1], c=y, cmap=ListedColormap(cmap_bold),
                     edgecolors='black', s=50)

# Set plot properties
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.title(f"k-NN Decision Boundaries (k = {n_neighbors})")
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1])

# Add legend
plt.legend(scatter.legend_elements()[0], iris.target_names, 
          title="Classes", loc="upper right")

plt.show()

Key Components Explained

Mesh Grid Creation

The mesh grid creates a fine grid of points across the entire feature space ?

import numpy as np

# Example of mesh grid creation
x_min, x_max = 0, 5
y_min, y_max = 0, 3
h = 0.5

xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                     np.arange(y_min, y_max, h))

print("X coordinates shape:", xx.shape)
print("Y coordinates shape:", yy.shape)
print("Sample X coordinates:")
print(xx)

X coordinates shape: (6, 10)
Y coordinates shape: (6, 10)
Sample X coordinates:
[[0.  0.5 1.  1.5 2.  2.5 3.  3.5 4.  4.5]
 [0.  0.5 1.  1.5 2.  2.5 3.  3.5 4.  4.5]
 [0.  0.5 1.  1.5 2.  2.5 3.  3.5 4.  4.5]
 [0.  0.5 1.  1.5 2.  2.5 3.  3.5 4.  4.5]
 [0.  0.5 1.  1.5 2.  2.5 3.  3.5 4.  4.5]
 [0.  0.5 1.  1.5 2.  2.5 3.  3.5 4.  4.5]]

Effect of k Value

Different k values create different decision boundaries ?

from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.colors import ListedColormap

# Load iris data
iris = datasets.load_iris()
X = iris.data[:, :2]
y = iris.target

# Different k values
k_values = [1, 5, 15]
h = 0.02

fig, axes = plt.subplots(1, 3, figsize=(15, 4))
cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF'])

for idx, k in enumerate(k_values):
    # Train classifier
    clf = KNeighborsClassifier(n_neighbors=k)
    clf.fit(X, y)
    
    # Create mesh
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    
    # Predict
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    
    # Plot
    axes[idx].contourf(xx, yy, Z, cmap=cmap_light, alpha=0.8)
    axes[idx].scatter(X[:, 0], X[:, 1], c=y, edgecolors='black')
    axes[idx].set_title(f'k = {k}')
    axes[idx].set_xlabel('Sepal Length')
    axes[idx].set_ylabel('Sepal Width')

plt.tight_layout()
plt.show()

Comparison of k Values

k Value	Decision Boundary	Characteristics
k = 1	Very complex, jagged	High variance, overfitting
k = 5-15	Smooth, balanced	Good bias-variance tradeoff
k = large	Very smooth, simple	High bias, underfitting

Conclusion

k-NN decision boundaries visualize how the classifier partitions the feature space based on the nearest neighbors. Use contour plots to show decision regions and experiment with different k values to find the optimal balance between overfitting and underfitting.

---

Rishikesh Kumar Rishi

Updated on: 2026-03-26T14:55:21+05:30

7K+ Views

Previous Next