Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Handwritten Digit Recognition using Neural Network
Handwritten digit recognition is a fundamental task in computer vision and deep learning. It demonstrates how neural networks can classify images into multiple categories, making it an excellent introduction to multiclass image classification using convolutional neural networks.
Binary vs Multiclass Image Classification
Before diving into digit recognition, let's understand the classification types:
Binary Image Classification
In binary classification, the model predicts between two classes. For example, classifying images as either cats or dogs.
Multiclass Image Classification
In multiclass classification, the model predicts among more than two classes. Handwritten digit recognition is a perfect example, where we classify digits from 0 to 9 (10 classes).
The MNIST Dataset
The MNIST dataset is the standard benchmark for handwritten digit recognition. It contains:
- 60,000 training images
- 10,000 test images
- Each image is 28×28 pixels in grayscale
- Digits labeled from 0 to 9
Implementation using CNN
Here's a complete implementation using a Convolutional Neural Network with Keras ?
# Handwritten Digit Recognition using CNN
import keras
from keras.layers import Conv2D, MaxPooling2D, Dense, Dropout, Flatten
from keras.models import Sequential
from keras.datasets import mnist
from keras.utils import to_categorical
import matplotlib.pyplot as plt
# Configuration parameters
batch_size = 128
num_classes = 10
epochs = 10
input_shape = (28, 28, 1)
# Load and explore the MNIST dataset
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()
print("Training data shape: {}, Test data shape: {}".format(X_train.shape, X_test.shape))
# Visualize a sample digit
plt.figure(figsize=(6, 4))
plt.imshow(X_train[0], cmap='gray')
plt.title(f'Sample digit: {Y_train[0]}')
plt.show()
# Preprocessing: Reshape and normalize
X_train = X_train.reshape(X_train.shape[0], 28, 28, 1)
X_test = X_test.reshape(X_test.shape[0], 28, 28, 1)
# Convert to categorical (one-hot encoding)
Y_train = to_categorical(Y_train, num_classes)
Y_test = to_categorical(Y_test, num_classes)
# Normalize pixel values to [0, 1]
X_train = X_train.astype('float32') / 255.0
X_test = X_test.astype('float32') / 255.0
print('Preprocessed shapes:')
print('X_train shape:', X_train.shape)
print('Training samples:', X_train.shape[0])
print('Test samples:', X_test.shape[0])
# Build CNN model
model = Sequential([
Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape),
Conv2D(64, (3, 3), activation='relu'),
MaxPooling2D(pool_size=(2, 2)),
Dropout(0.25),
Flatten(),
Dense(256, activation='relu'),
Dropout(0.5),
Dense(num_classes, activation='softmax')
])
# Compile the model
model.compile(
loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy']
)
# Display model architecture
model.summary()
# Train the model
print("Training the model...")
history = model.fit(
X_train, Y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(X_test, Y_test)
)
# Evaluate the model
test_loss, test_accuracy = model.evaluate(X_test, Y_test, verbose=0)
print(f'\nTest Loss: {test_loss:.4f}')
print(f'Test Accuracy: {test_accuracy:.4f}')
Model Architecture Breakdown
| Layer Type | Purpose | Parameters |
|---|---|---|
| Conv2D (32 filters) | Feature extraction | 3×3 kernel, ReLU activation |
| Conv2D (64 filters) | Higher-level features | 3×3 kernel, ReLU activation |
| MaxPooling2D | Dimensionality reduction | 2×2 pool size |
| Dropout | Prevent overfitting | 25% dropout rate |
| Dense | Classification | 256 neurons, ReLU activation |
| Output Dense | Final prediction | 10 neurons, Softmax activation |
Key Preprocessing Steps
- Reshaping: Convert 28×28 images to 28×28×1 format for CNN input
- Normalization: Scale pixel values from [0, 255] to [0, 1]
- One-hot encoding: Convert labels to categorical format
Expected Results
With this architecture, you can expect:
- Training accuracy: ~95-98%
- Test accuracy: ~98-99%
- Training time: 2-3 minutes on modern hardware
Conclusion
Handwritten digit recognition using CNNs demonstrates the power of deep learning for image classification. The MNIST dataset provides an excellent starting point for understanding multiclass classification, and the CNN architecture effectively captures spatial patterns in handwritten digits to achieve high accuracy.
