How to transform Sklearn DIGITS dataset to 2 and 3-feature dataset in Python?

The sklearn DIGITS dataset contains 64 features as each handwritten digit image is 8×8 pixels. We can use Principal Component Analysis (PCA) to reduce dimensionality and transform this dataset into 2 or 3-feature datasets. While this significantly reduces data size, it also loses some information and may impact ML model accuracy.

Transform DIGITS Dataset to 2 Features

We can reduce the 64-dimensional DIGITS dataset to 2 dimensions using PCA. This creates a simplified representation suitable for visualization and faster processing ?

# Import necessary packages
from sklearn import datasets
from sklearn.decomposition import PCA

# Load DIGITS dataset
digits = datasets.load_digits()
X_digits, y_digits = digits.data, digits.target
print('Original DIGITS Dataset Size:', X_digits.shape, y_digits.shape)

# Initialize PCA with 2 components
pca_2 = PCA(n_components=2)
pca_2.fit(X_digits)

# Transform to 2 dimensions
X_digits_2d = pca_2.transform(X_digits)
print('New Dataset size after PCA transformation:', X_digits_2d.shape)

# Check explained variance ratio
print('Explained variance ratio:', pca_2.explained_variance_ratio_)
print('Total variance explained:', sum(pca_2.explained_variance_ratio_))
Original DIGITS Dataset Size: (1797, 64) (1797,)
New Dataset size after PCA transformation: (1797, 2)
Explained variance ratio: [0.14890594 0.13618771]
Total variance explained: 0.28509365061189297

Transform DIGITS Dataset with Limited Classes

You can also transform a subset of the DIGITS dataset by loading only specific digit classes. This is useful when working with fewer categories ?

# Import necessary packages
from sklearn import datasets
from sklearn.decomposition import PCA

# Load DIGITS dataset with only first 6 classes (digits 0-5)
digits_6 = datasets.load_digits(n_class=6)
X_digits_6, y_digits_6 = digits_6.data, digits_6.target
print('DIGITS Dataset Size (6 classes):', X_digits_6.shape, y_digits_6.shape)

# Apply PCA transformation
pca_2 = PCA(n_components=2)
X_digits_6_2d = pca_2.fit_transform(X_digits_6)
print('Transformed Dataset size:', X_digits_6_2d.shape)
DIGITS Dataset Size (6 classes): (1083, 64) (1083,)
Transformed Dataset size: (1083, 2)

Transform DIGITS Dataset to 3 Features

A 3-dimensional transformation provides more information retention compared to 2D while still achieving significant dimensionality reduction ?

# Import necessary packages
from sklearn import datasets
from sklearn.decomposition import PCA

# Load DIGITS dataset
digits = datasets.load_digits()
X_digits, y_digits = digits.data, digits.target
print('Original DIGITS Dataset Size:', X_digits.shape, y_digits.shape)

# Initialize PCA with 3 components
pca_3 = PCA(n_components=3)
X_digits_3d = pca_3.fit_transform(X_digits)
print('New Dataset size after PCA transformation:', X_digits_3d.shape)

# Compare variance explained
print('Explained variance ratio (3D):', pca_3.explained_variance_ratio_)
print('Total variance explained:', sum(pca_3.explained_variance_ratio_))
Original DIGITS Dataset Size: (1797, 64) (1797,)
New Dataset size after PCA transformation: (1797, 3)
Explained variance ratio (3D): [0.14890594 0.13618771 0.11794594]
Total variance explained: 0.4030395874652833

Comparison of Dimensionality Reduction

Components Dataset Shape Variance Explained Use Case
2 (1797, 2) ~28.5% 2D visualization, simple models
3 (1797, 3) ~40.3% 3D visualization, balanced reduction
64 (original) (1797, 64) 100% Full information, complex models

Conclusion

PCA effectively reduces the DIGITS dataset from 64 to 2 or 3 features while preserving the most important variance. Use 2D for visualization and simple models, or 3D when you need slightly better information retention with manageable complexity.

Updated on: 2026-03-26T22:14:28+05:30

694 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements