Computer Vision - Object Detection



What is Object Detection?

Object detection is a computer vision technique for locating instances of objects within images or videos.

The goal is to identify the presence of objects and draw bounding boxes around them to indicate their position. Object detection combines image classification and object localization.

Importance of Object Detection

Object detection is important for various real-world applications, such as −

  • Autonomous Vehicles: Detecting pedestrians, vehicles, and obstacles on the road.
  • Surveillance: Monitoring activities and identifying suspicious objects.
  • Healthcare: Detecting abnormalities in medical images.
  • Robotics: Enabling robots to interact with objects in their environment.

Object Detection Techniques

There are various techniques for object detection, they are −

  • Traditional Methods
  • Machine Learning-Based Methods
  • Deep Learning-Based Methods

Traditional Methods

Traditional methods for object detection depend on image processing techniques and custom-built features. These methods are often less accurate than modern machine learning-based approaches but are simpler and faster.

The commonly used traditional method is Haar Cascades. It uses a cascade of classifiers trained with positive and negative images to detect objects.

import cv2

# Load the pre-trained Haar Cascade classifier for face detection
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
# Read the input image
image = cv2.imread('image.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Detect faces
faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30))
# Draw bounding boxes around detected faces
for (x, y, w, h) in faces:
    cv2.rectangle(image, (x, y), (x+w, y+h), (255, 0, 0), 2)
cv2.imshow('Detected Faces', image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Machine Learning-Based Methods

Machine learning-based methods use algorithms that learn from data to detect objects. These methods often involve training classifiers on labeled datasets.

The most commonly used machine learning based method is Histogram of Oriented Gradients (HOG) + SVM. This extracts HOG features and uses a Support Vector Machine (SVM) to classify objects.

from skimage.feature import hog
from sklearn.svm import LinearSVC
import joblib

# Load the pre-trained HOG + SVM model
model = joblib.load('hog_svm_model.pkl')
# Extract HOG features from the input image
features, _ = hog(image, orientations=9, pixels_per_cell=(8, 8), cells_per_block=(2, 2), block_norm='L2-Hys', visualize=True)
# Predict the presence of objects using the trained SVM model
prediction = model.predict([features])

Deep Learning-Based Methods

Deep learning-based methods have transformed object detection with their high accuracy and ability to handle complex images. These methods use convolutional neural networks (CNNs) to learn features and perform detection.

The common deep learning based methods are as shown below −

  • R-CNN (Region-Based Convolutional Neural Networks): Proposes candidate regions and uses a CNN to classify them.
  • YOLO (You Only Look Once): Divides the image into a grid and predicts bounding boxes and class probabilities directly.
  • SSD (Single Shot MultiBox Detector): Similar to YOLO, but uses a different architecture for faster and more accurate detection.

YOLO (You Only Look Once)

YOLO is a popular and efficient object detection model. It divides the image into a grid and predicts bounding boxes and class probabilities for each grid cell.

You can go through the following steps to use YOLO −

  • Step 1: Load the Pre-trained YOLO model.
import cv2
import numpy as np

# Load YOLO
net = cv2.dnn.readNet('yolov3.weights', 'yolov3.cfg')
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
  • Step 2: Prepare the input image.
  • # Load the input image
    image = cv2.imread('image.jpg')
    height, width, channels = image.shape
    # Prepare the image for YOLO
    blob = cv2.dnn.blobFromImage(image, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
    net.setInput(blob)
    
  • Step 3: Run the model and get predictions.
  • # Run the model
    outs = net.forward(output_layers)
    
  • Step 4: Process the output and draw bounding boxes.
  • class_ids = []
    confidences = []
    boxes = []
    
    for out in outs:
       for detection in out:
          scores = detection[5:]
          class_id = np.argmax(scores)
          confidence = scores[class_id]
          if confidence > 0.5:
             # Object detected
             center_x = int(detection[0] * width)
             center_y = int(detection[1] * height)
             w = int(detection[2] * width)
             h = int(detection[3] * height)
             # Rectangle coordinates
             x = int(center_x - w / 2)
             y = int(center_y - h / 2)
             boxes.append([x, y, w, h])
             confidences.append(float(confidence))
             class_ids.append(class_id)
    
    # Apply Non-Maximum Suppression
    indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)
    
    # Draw bounding boxes
    for i in range(len(boxes)):
       if i in indexes:
          x, y, w, h = boxes[i]
          label = str(class_ids[i])
          cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
          cv2.putText(image, label, (x, y + 30), cv2.FONT_HERSHEY_PLAIN, 3, (0, 255, 0), 3)
    
    cv2.imshow('Object Detection', image)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
    
    Advertisements