A Beginner’s Guide to Image Classification using CNN (Python implementation)

Convolutional Neural Networks (CNNs) are specialized neural networks designed to process grid-like data such as images. CNNs automatically extract features through convolutional and pooling layers, then use fully connected layers for classification. This makes them ideal for image recognition tasks where important features may not be known beforehand.

In this guide, we'll explore CNN architecture and implement a complete image classification model using Python and Keras on the MNIST handwritten digits dataset.

CNN Architecture

CNNs consist of three main layer types that work together to extract and classify image features ?

Convolutional Layers

Convolutional layers apply filters (kernels) to input data by sliding small matrices across the image and computing dot products. This creates feature maps that capture spatial relationships. Key parameters include:

  • Kernel size Dimensions of the filter (e.g., 3×3)
  • Stride Step size when moving the kernel
  • Padding Border pixels added to maintain output size

Pooling Layers

Pooling layers downsample feature maps by applying operations like max or average pooling. This reduces spatial dimensions, decreases parameters, and improves generalization by providing translation invariance.

Fully Connected Layers

These layers classify the extracted features by connecting every neuron to all neurons in the previous layer. The final layer outputs class probabilities using activation functions like softmax.

CNN Architecture Input 28×28 Conv2D 32 filters 3×3 MaxPool 2×2 Conv2D 64 filters 3×3 MaxPool 2×2 Flatten Dense 128 units Output 10 classes

Implementing CNN for MNIST Classification

Let's build a complete CNN to classify handwritten digits using TensorFlow and Keras ?

Installing Required Libraries

pip install tensorflow matplotlib numpy

Importing Libraries and Loading Data

import numpy as np
import matplotlib.pyplot as plt
from tensorflow import keras
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout

# Load and preprocess the MNIST dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()

print(f"Training data shape: {X_train.shape}")
print(f"Training labels shape: {y_train.shape}")
print(f"Test data shape: {X_test.shape}")
Training data shape: (60000, 28, 28)
Training labels shape: (60000,)
Test data shape: (10000, 28, 28)

Data Preprocessing

# Normalize pixel values to 0-1 range
X_train = X_train.astype(np.float32) / 255.0
X_test = X_test.astype(np.float32) / 255.0

# Add channel dimension for CNN input
X_train = np.expand_dims(X_train, axis=-1)
X_test = np.expand_dims(X_test, axis=-1)

# Convert labels to one-hot encoded format
y_train = to_categorical(y_train, num_classes=10)
y_test = to_categorical(y_test, num_classes=10)

print(f"Processed training data shape: {X_train.shape}")
print(f"Processed training labels shape: {y_train.shape}")
Processed training data shape: (60000, 28, 28, 1)
Processed training labels shape: (60000, 10)

Building the CNN Model

# Create the CNN architecture
model = Sequential([
    Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)),
    MaxPooling2D(pool_size=(2, 2)),
    Dropout(0.25),
    Conv2D(64, kernel_size=(3, 3), activation='relu'),
    MaxPooling2D(pool_size=(2, 2)),
    Dropout(0.25),
    Flatten(),
    Dense(128, activation='relu'),
    Dropout(0.5),
    Dense(10, activation='softmax')
])

# Display model architecture
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 26, 26, 32)       320       
max_pooling2d (MaxPooling2D) (None, 13, 13, 32)       0         
dropout (Dropout)            (None, 13, 13, 32)       0         
conv2d_1 (Conv2D)            (None, 11, 11, 64)       18496     
max_pooling2d_1 (MaxPooling2D (None, 5, 5, 64)        0         
dropout_1 (Dropout)          (None, 5, 5, 64)         0         
flatten (Flatten)            (None, 1600)             0         
dense (Dense)                (None, 128)              204928    
dropout_2 (Dropout)          (None, 128)              0         
dense_1 (Dense)              (None, 10)               1290      
=================================================================
Total params: 225,034
Trainable params: 225,034
Non-trainable params: 0

Training the Model

# Compile the model
model.compile(
    loss='categorical_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

# Train the model
history = model.fit(
    X_train, y_train,
    epochs=5,
    batch_size=128,
    validation_data=(X_test, y_test),
    verbose=1
)
Epoch 1/5
469/469 [==============================] - 15s 32ms/step - loss: 0.2615 - accuracy: 0.9196 - val_loss: 0.0537 - val_accuracy: 0.9827
Epoch 2/5
469/469 [==============================] - 14s 30ms/step - loss: 0.0856 - accuracy: 0.9745 - val_loss: 0.0388 - val_accuracy: 0.9878
Epoch 3/5
469/469 [==============================] - 14s 30ms/step - loss: 0.0650 - accuracy: 0.9802 - val_loss: 0.0331 - val_accuracy: 0.9889
Epoch 4/5
469/469 [==============================] - 14s 30ms/step - loss: 0.0547 - accuracy: 0.9835 - val_loss: 0.0305 - val_accuracy: 0.9896
Epoch 5/5
469/469 [==============================] - 14s 30ms/step - loss: 0.0463 - accuracy: 0.9854 - val_loss: 0.0280 - val_accuracy: 0.9906

Evaluating Model Performance

# Evaluate on test data
test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0)
print(f"Test Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f}")

# Plot training history
plt.figure(figsize=(12, 4))

plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend()

plt.tight_layout()
plt.show()
Test Loss: 0.0280
Test Accuracy: 0.9906

Making Predictions

# Make predictions on test samples
predictions = model.predict(X_test[:5])
predicted_classes = np.argmax(predictions, axis=1)
actual_classes = np.argmax(y_test[:5], axis=1)

print("Predictions vs Actual:")
for i in range(5):
    print(f"Sample {i+1}: Predicted={predicted_classes[i]}, Actual={actual_classes[i]}")
1/1 [==============================] - 0s 89ms/step
Predictions vs Actual:
Sample 1: Predicted=7, Actual=7
Sample 2: Predicted=2, Actual=2
Sample 3: Predicted=1, Actual=1
Sample 4: Predicted=0, Actual=0
Sample 5: Predicted=4, Actual=4

Key CNN Concepts Summary

Component Purpose Parameters
Convolutional Layer Feature extraction Filters, kernel size, stride, padding
Pooling Layer Dimensionality reduction Pool size, stride
Updated on: 2026-03-27T13:10:12+05:30

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements