Article Categories

Selected Reading

Handwritten Digit Recognition using Neural Network

Machine Learning Artificial Intelligence Python

Handwritten digit recognition is a fundamental task in computer vision and deep learning. It demonstrates how neural networks can classify images into multiple categories, making it an excellent introduction to multiclass image classification using convolutional neural networks.

Binary vs Multiclass Image Classification

Before diving into digit recognition, let's understand the classification types:

Binary Image Classification

In binary classification, the model predicts between two classes. For example, classifying images as either cats or dogs.

Multiclass Image Classification

In multiclass classification, the model predicts among more than two classes. Handwritten digit recognition is a perfect example, where we classify digits from 0 to 9 (10 classes).

The MNIST Dataset

The MNIST dataset is the standard benchmark for handwritten digit recognition. It contains:

60,000 training images
10,000 test images
Each image is 28×28 pixels in grayscale
Digits labeled from 0 to 9

Implementation using CNN

Here's a complete implementation using a Convolutional Neural Network with Keras ?

# Handwritten Digit Recognition using CNN
import keras
from keras.layers import Conv2D, MaxPooling2D, Dense, Dropout, Flatten
from keras.models import Sequential
from keras.datasets import mnist
from keras.utils import to_categorical
import matplotlib.pyplot as plt

# Configuration parameters
batch_size = 128
num_classes = 10
epochs = 10
input_shape = (28, 28, 1)

# Load and explore the MNIST dataset
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()
print("Training data shape: {}, Test data shape: {}".format(X_train.shape, X_test.shape))

# Visualize a sample digit
plt.figure(figsize=(6, 4))
plt.imshow(X_train[0], cmap='gray')
plt.title(f'Sample digit: {Y_train[0]}')
plt.show()

# Preprocessing: Reshape and normalize
X_train = X_train.reshape(X_train.shape[0], 28, 28, 1)
X_test = X_test.reshape(X_test.shape[0], 28, 28, 1)

# Convert to categorical (one-hot encoding)
Y_train = to_categorical(Y_train, num_classes)
Y_test = to_categorical(Y_test, num_classes)

# Normalize pixel values to [0, 1]
X_train = X_train.astype('float32') / 255.0
X_test = X_test.astype('float32') / 255.0

print('Preprocessed shapes:')
print('X_train shape:', X_train.shape)
print('Training samples:', X_train.shape[0])
print('Test samples:', X_test.shape[0])

# Build CNN model
model = Sequential([
    Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape),
    Conv2D(64, (3, 3), activation='relu'),
    MaxPooling2D(pool_size=(2, 2)),
    Dropout(0.25),
    Flatten(),
    Dense(256, activation='relu'),
    Dropout(0.5),
    Dense(num_classes, activation='softmax')
])

# Compile the model
model.compile(
    loss='categorical_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

# Display model architecture
model.summary()

# Train the model
print("Training the model...")
history = model.fit(
    X_train, Y_train,
    batch_size=batch_size,
    epochs=epochs,
    verbose=1,
    validation_data=(X_test, Y_test)
)

# Evaluate the model
test_loss, test_accuracy = model.evaluate(X_test, Y_test, verbose=0)
print(f'\nTest Loss: {test_loss:.4f}')
print(f'Test Accuracy: {test_accuracy:.4f}')

Model Architecture Breakdown

Layer Type	Purpose	Parameters
Conv2D (32 filters)	Feature extraction	3×3 kernel, ReLU activation
Conv2D (64 filters)	Higher-level features	3×3 kernel, ReLU activation
MaxPooling2D	Dimensionality reduction	2×2 pool size
Dropout	Prevent overfitting	25% dropout rate
Dense	Classification	256 neurons, ReLU activation
Output Dense	Final prediction	10 neurons, Softmax activation

Key Preprocessing Steps

Reshaping: Convert 28×28 images to 28×28×1 format for CNN input
Normalization: Scale pixel values from [0, 255] to [0, 1]
One-hot encoding: Convert labels to categorical format

Expected Results

With this architecture, you can expect:

Training accuracy: ~95-98%
Test accuracy: ~98-99%
Training time: 2-3 minutes on modern hardware

Conclusion

Handwritten digit recognition using CNNs demonstrates the power of deep learning for image classification. The MNIST dataset provides an excellent starting point for understanding multiclass classification, and the CNN architecture effectively captures spatial patterns in handwritten digits to achieve high accuracy.

Mithilesh Pradhan

Updated on: 2026-03-26T23:29:26+05:30

2K+ Views

Previous Next