Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
A Beginner’s Guide to Image Classification using CNN (Python implementation)
Convolutional Neural Networks (CNNs) are specialized neural networks designed to process grid-like data such as images. CNNs automatically extract features through convolutional and pooling layers, then use fully connected layers for classification. This makes them ideal for image recognition tasks where important features may not be known beforehand.
In this guide, we'll explore CNN architecture and implement a complete image classification model using Python and Keras on the MNIST handwritten digits dataset.
CNN Architecture
CNNs consist of three main layer types that work together to extract and classify image features ?
Convolutional Layers
Convolutional layers apply filters (kernels) to input data by sliding small matrices across the image and computing dot products. This creates feature maps that capture spatial relationships. Key parameters include:
- Kernel size Dimensions of the filter (e.g., 3×3)
- Stride Step size when moving the kernel
- Padding Border pixels added to maintain output size
Pooling Layers
Pooling layers downsample feature maps by applying operations like max or average pooling. This reduces spatial dimensions, decreases parameters, and improves generalization by providing translation invariance.
Fully Connected Layers
These layers classify the extracted features by connecting every neuron to all neurons in the previous layer. The final layer outputs class probabilities using activation functions like softmax.
Implementing CNN for MNIST Classification
Let's build a complete CNN to classify handwritten digits using TensorFlow and Keras ?
Installing Required Libraries
pip install tensorflow matplotlib numpy
Importing Libraries and Loading Data
import numpy as np
import matplotlib.pyplot as plt
from tensorflow import keras
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
# Load and preprocess the MNIST dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()
print(f"Training data shape: {X_train.shape}")
print(f"Training labels shape: {y_train.shape}")
print(f"Test data shape: {X_test.shape}")
Training data shape: (60000, 28, 28) Training labels shape: (60000,) Test data shape: (10000, 28, 28)
Data Preprocessing
# Normalize pixel values to 0-1 range
X_train = X_train.astype(np.float32) / 255.0
X_test = X_test.astype(np.float32) / 255.0
# Add channel dimension for CNN input
X_train = np.expand_dims(X_train, axis=-1)
X_test = np.expand_dims(X_test, axis=-1)
# Convert labels to one-hot encoded format
y_train = to_categorical(y_train, num_classes=10)
y_test = to_categorical(y_test, num_classes=10)
print(f"Processed training data shape: {X_train.shape}")
print(f"Processed training labels shape: {y_train.shape}")
Processed training data shape: (60000, 28, 28, 1) Processed training labels shape: (60000, 10)
Building the CNN Model
# Create the CNN architecture
model = Sequential([
Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)),
MaxPooling2D(pool_size=(2, 2)),
Dropout(0.25),
Conv2D(64, kernel_size=(3, 3), activation='relu'),
MaxPooling2D(pool_size=(2, 2)),
Dropout(0.25),
Flatten(),
Dense(128, activation='relu'),
Dropout(0.5),
Dense(10, activation='softmax')
])
# Display model architecture
model.summary()
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 26, 26, 32) 320 max_pooling2d (MaxPooling2D) (None, 13, 13, 32) 0 dropout (Dropout) (None, 13, 13, 32) 0 conv2d_1 (Conv2D) (None, 11, 11, 64) 18496 max_pooling2d_1 (MaxPooling2D (None, 5, 5, 64) 0 dropout_1 (Dropout) (None, 5, 5, 64) 0 flatten (Flatten) (None, 1600) 0 dense (Dense) (None, 128) 204928 dropout_2 (Dropout) (None, 128) 0 dense_1 (Dense) (None, 10) 1290 ================================================================= Total params: 225,034 Trainable params: 225,034 Non-trainable params: 0
Training the Model
# Compile the model
model.compile(
loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy']
)
# Train the model
history = model.fit(
X_train, y_train,
epochs=5,
batch_size=128,
validation_data=(X_test, y_test),
verbose=1
)
Epoch 1/5 469/469 [==============================] - 15s 32ms/step - loss: 0.2615 - accuracy: 0.9196 - val_loss: 0.0537 - val_accuracy: 0.9827 Epoch 2/5 469/469 [==============================] - 14s 30ms/step - loss: 0.0856 - accuracy: 0.9745 - val_loss: 0.0388 - val_accuracy: 0.9878 Epoch 3/5 469/469 [==============================] - 14s 30ms/step - loss: 0.0650 - accuracy: 0.9802 - val_loss: 0.0331 - val_accuracy: 0.9889 Epoch 4/5 469/469 [==============================] - 14s 30ms/step - loss: 0.0547 - accuracy: 0.9835 - val_loss: 0.0305 - val_accuracy: 0.9896 Epoch 5/5 469/469 [==============================] - 14s 30ms/step - loss: 0.0463 - accuracy: 0.9854 - val_loss: 0.0280 - val_accuracy: 0.9906
Evaluating Model Performance
# Evaluate on test data
test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0)
print(f"Test Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f}")
# Plot training history
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend()
plt.tight_layout()
plt.show()
Test Loss: 0.0280 Test Accuracy: 0.9906
Making Predictions
# Make predictions on test samples
predictions = model.predict(X_test[:5])
predicted_classes = np.argmax(predictions, axis=1)
actual_classes = np.argmax(y_test[:5], axis=1)
print("Predictions vs Actual:")
for i in range(5):
print(f"Sample {i+1}: Predicted={predicted_classes[i]}, Actual={actual_classes[i]}")
1/1 [==============================] - 0s 89ms/step Predictions vs Actual: Sample 1: Predicted=7, Actual=7 Sample 2: Predicted=2, Actual=2 Sample 3: Predicted=1, Actual=1 Sample 4: Predicted=0, Actual=0 Sample 5: Predicted=4, Actual=4
Key CNN Concepts Summary
| Component | Purpose | Parameters |
|---|---|---|
| Convolutional Layer | Feature extraction | Filters, kernel size, stride, padding |
| Pooling Layer | Dimensionality reduction | Pool size, stride |
