Computer Vision - Image Classification



What is Image Classification?

Image classification is the process of categorizing and labeling groups of pixels or vectors within an image based on specific rules.

It involves assigning a label or class to an entire image, such as identifying whether an image contains a cat, dog, or any other object.

Importance of Image Classification

Image classification is important for various applications, such as −

  • Healthcare: Classifying medical images to detect diseases.
  • Security: Recognizing faces or objects in surveillance footage.
  • Retail: Sorting products and automating inventory management.
  • Autonomous Vehicles: Identifying traffic signs, pedestrians, and other objects on the road.

Image Classification Techniques

There are various techniques for image classification, they are −

  • Traditional Methods
  • Machine Learning-Based Methods
  • Deep Learning-Based Methods

Traditional Methods

Traditional methods for image classification depends on image processing techniques and custom-built features.

These methods are less accurate than modern machine learning-based approaches but are simpler and faster.

Following are the commonly used traditional methods for image classification −

  • Template Matching: Compares the input image with a set of template images. This method is simple but not very effective for complex images.
  • Feature Extraction + Classifier: Involves extracting features from images and using a classifier to categorize them. For example, using edge detection and texture analysis followed by a decision tree classifier.

Machine Learning-Based Methods

Machine learning-based methods use algorithms that learn from data to classify images. These methods often involve extracting features from images and training classifiers on labeled datasets.

Following are the commonly used machine learning methods for image classification −

  • Support Vector Machine (SVM): It is a supervised learning model that finds the best line (or hyperplane) to separate different groups in the data.
  • k-Nearest Neighbors (k-NN): It is a simple method that classifies an image by looking at its closest k neighbors and choosing the most common category among them.

Following is an example on how to classify an image using machine learning-based methods −

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load dataset
digits = datasets.load_digits()
X = digits.data
y = digits.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train k-NN classifier
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

# Predict and evaluate
y_pred = knn.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))

Deep Learning-Based Methods

Deep learning methods have changed image classification by making it more accurate and capable of dealing with complex images.

These methods use convolutional neural networks (CNNs) to learn features automatically and classify images.

Following are the common deep learning models for image classification −

  • LeNet: It is one of the earliest CNN architectures, designed to recognize handwritten digit.
  • AlexNet: It is a deeper CNN that won the ImageNet competition in 2012, bringing significant improvements in image classification.
  • ResNet (Residual Networks): It uses residual connections to train very deep networks, achieving top performance.

Example with CNNs

CNNs, or Convolutional Neural Networks, are a kind of deep neural network created to handle images. They have several layers that learn different features of images step-by-step, without needing manual programming.

You can go through the steps below to use CNNs −

  • Step 1: Build the CNN model.
import tensorflow as tf
from tensorflow.keras import layers, models

# Build the CNN model
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))

model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))
  • Step 2: Compile the model.
  • model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    
  • Step 3: Train the model.
  • # Load dataset
    mnist = tf.keras.datasets.mnist
    (X_train, y_train), (X_test, y_test) = mnist.load_data()
    X_train, X_test = X_train / 255.0, X_test / 255.0
    
    # Expand dimensions to match the input shape of the model
    X_train = X_train[..., tf.newaxis]
    X_test = X_test[..., tf.newaxis]
    
    # Train the model
    model.fit(X_train, y_train, epochs=5, validation_data=(X_test, y_test))
    
  • Step 4: Evaluate the model.
  • # Evaluate the model
    test_loss, test_acc = model.evaluate(X_test, y_test, verbose=2)
    print("Test accuracy:", test_acc)
    
    Advertisements