- Computer Vision - Home
- Computer Vision - Introduction
- Computer Vision - Fundamentals of Image Processing
- Computer Vision - Image Segmentation
- Computer Vision - Image Preprocessing Techniques
- Computer Vision - Feature Detection and Extraction
- Computer Vision - Object Detection
- Computer Vision - Image Classification
- Computer Vision - Image Recognition and Matching
- Computer Vision Useful Resources
- Computer Vision - Useful Resources
- Computer Vision - Discussion
Computer Vision - Object Detection
What is Object Detection?
Object detection is a computer vision technique for locating instances of objects within images or videos.
The goal is to identify the presence of objects and draw bounding boxes around them to indicate their position. Object detection combines image classification and object localization.
Importance of Object Detection
Object detection is important for various real-world applications, such as −
- Autonomous Vehicles: Detecting pedestrians, vehicles, and obstacles on the road.
- Surveillance: Monitoring activities and identifying suspicious objects.
- Healthcare: Detecting abnormalities in medical images.
- Robotics: Enabling robots to interact with objects in their environment.
Object Detection Techniques
There are various techniques for object detection, they are −
- Traditional Methods
- Machine Learning-Based Methods
- Deep Learning-Based Methods
Traditional Methods
Traditional methods for object detection depend on image processing techniques and custom-built features. These methods are often less accurate than modern machine learning-based approaches but are simpler and faster.
The commonly used traditional method is Haar Cascades. It uses a cascade of classifiers trained with positive and negative images to detect objects.
import cv2
# Load the pre-trained Haar Cascade classifier for face detection
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
# Read the input image
image = cv2.imread('image.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Detect faces
faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30))
# Draw bounding boxes around detected faces
for (x, y, w, h) in faces:
cv2.rectangle(image, (x, y), (x+w, y+h), (255, 0, 0), 2)
cv2.imshow('Detected Faces', image)
cv2.waitKey(0)
cv2.destroyAllWindows()
Machine Learning-Based Methods
Machine learning-based methods use algorithms that learn from data to detect objects. These methods often involve training classifiers on labeled datasets.
The most commonly used machine learning based method is Histogram of Oriented Gradients (HOG) + SVM. This extracts HOG features and uses a Support Vector Machine (SVM) to classify objects.
from skimage.feature import hog
from sklearn.svm import LinearSVC
import joblib
# Load the pre-trained HOG + SVM model
model = joblib.load('hog_svm_model.pkl')
# Extract HOG features from the input image
features, _ = hog(image, orientations=9, pixels_per_cell=(8, 8), cells_per_block=(2, 2), block_norm='L2-Hys', visualize=True)
# Predict the presence of objects using the trained SVM model
prediction = model.predict([features])
Deep Learning-Based Methods
Deep learning-based methods have transformed object detection with their high accuracy and ability to handle complex images. These methods use convolutional neural networks (CNNs) to learn features and perform detection.
The common deep learning based methods are as shown below −
- R-CNN (Region-Based Convolutional Neural Networks): Proposes candidate regions and uses a CNN to classify them.
- YOLO (You Only Look Once): Divides the image into a grid and predicts bounding boxes and class probabilities directly.
- SSD (Single Shot MultiBox Detector): Similar to YOLO, but uses a different architecture for faster and more accurate detection.
YOLO (You Only Look Once)
YOLO is a popular and efficient object detection model. It divides the image into a grid and predicts bounding boxes and class probabilities for each grid cell.
You can go through the following steps to use YOLO −
- Step 1: Load the Pre-trained YOLO model.
import cv2
import numpy as np
# Load YOLO
net = cv2.dnn.readNet('yolov3.weights', 'yolov3.cfg')
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
# Load the input image
image = cv2.imread('image.jpg')
height, width, channels = image.shape
# Prepare the image for YOLO
blob = cv2.dnn.blobFromImage(image, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
net.setInput(blob)
# Run the model outs = net.forward(output_layers)
class_ids = []
confidences = []
boxes = []
for out in outs:
for detection in out:
scores = detection[5:]
class_id = np.argmax(scores)
confidence = scores[class_id]
if confidence > 0.5:
# Object detected
center_x = int(detection[0] * width)
center_y = int(detection[1] * height)
w = int(detection[2] * width)
h = int(detection[3] * height)
# Rectangle coordinates
x = int(center_x - w / 2)
y = int(center_y - h / 2)
boxes.append([x, y, w, h])
confidences.append(float(confidence))
class_ids.append(class_id)
# Apply Non-Maximum Suppression
indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)
# Draw bounding boxes
for i in range(len(boxes)):
if i in indexes:
x, y, w, h = boxes[i]
label = str(class_ids[i])
cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.putText(image, label, (x, y + 30), cv2.FONT_HERSHEY_PLAIN, 3, (0, 255, 0), 3)
cv2.imshow('Object Detection', image)
cv2.waitKey(0)
cv2.destroyAllWindows()