
- Computer Vision - Home
- Computer Vision - Introduction
- Computer Vision - Fundamentals of Image Processing
- Computer Vision - Image Segmentation
- Computer Vision - Image Preprocessing Techniques
- Computer Vision - Feature Detection and Extraction
- Computer Vision - Object Detection
- Computer Vision - Image Classification
- Computer Vision - Image Recognition and Matching
- Computer Vision Useful Resources
- Computer Vision - Useful Resources
- Computer Vision - Discussion
Computer Vision - Object Detection
What is Object Detection?
Object detection is a computer vision technique for locating instances of objects within images or videos.
The goal is to identify the presence of objects and draw bounding boxes around them to indicate their position. Object detection combines image classification and object localization.
Importance of Object Detection
Object detection is important for various real-world applications, such as −
- Autonomous Vehicles: Detecting pedestrians, vehicles, and obstacles on the road.
- Surveillance: Monitoring activities and identifying suspicious objects.
- Healthcare: Detecting abnormalities in medical images.
- Robotics: Enabling robots to interact with objects in their environment.
Object Detection Techniques
There are various techniques for object detection, they are −
- Traditional Methods
- Machine Learning-Based Methods
- Deep Learning-Based Methods
Traditional Methods
Traditional methods for object detection depend on image processing techniques and custom-built features. These methods are often less accurate than modern machine learning-based approaches but are simpler and faster.
The commonly used traditional method is Haar Cascades. It uses a cascade of classifiers trained with positive and negative images to detect objects.
import cv2 # Load the pre-trained Haar Cascade classifier for face detection face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml') # Read the input image image = cv2.imread('image.jpg') gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # Detect faces faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30)) # Draw bounding boxes around detected faces for (x, y, w, h) in faces: cv2.rectangle(image, (x, y), (x+w, y+h), (255, 0, 0), 2) cv2.imshow('Detected Faces', image) cv2.waitKey(0) cv2.destroyAllWindows()
Machine Learning-Based Methods
Machine learning-based methods use algorithms that learn from data to detect objects. These methods often involve training classifiers on labeled datasets.
The most commonly used machine learning based method is Histogram of Oriented Gradients (HOG) + SVM. This extracts HOG features and uses a Support Vector Machine (SVM) to classify objects.
from skimage.feature import hog from sklearn.svm import LinearSVC import joblib # Load the pre-trained HOG + SVM model model = joblib.load('hog_svm_model.pkl') # Extract HOG features from the input image features, _ = hog(image, orientations=9, pixels_per_cell=(8, 8), cells_per_block=(2, 2), block_norm='L2-Hys', visualize=True) # Predict the presence of objects using the trained SVM model prediction = model.predict([features])
Deep Learning-Based Methods
Deep learning-based methods have transformed object detection with their high accuracy and ability to handle complex images. These methods use convolutional neural networks (CNNs) to learn features and perform detection.
The common deep learning based methods are as shown below −
- R-CNN (Region-Based Convolutional Neural Networks): Proposes candidate regions and uses a CNN to classify them.
- YOLO (You Only Look Once): Divides the image into a grid and predicts bounding boxes and class probabilities directly.
- SSD (Single Shot MultiBox Detector): Similar to YOLO, but uses a different architecture for faster and more accurate detection.
YOLO (You Only Look Once)
YOLO is a popular and efficient object detection model. It divides the image into a grid and predicts bounding boxes and class probabilities for each grid cell.
You can go through the following steps to use YOLO −
- Step 1: Load the Pre-trained YOLO model.
import cv2 import numpy as np # Load YOLO net = cv2.dnn.readNet('yolov3.weights', 'yolov3.cfg') layer_names = net.getLayerNames() output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
# Load the input image image = cv2.imread('image.jpg') height, width, channels = image.shape # Prepare the image for YOLO blob = cv2.dnn.blobFromImage(image, 0.00392, (416, 416), (0, 0, 0), True, crop=False) net.setInput(blob)
# Run the model outs = net.forward(output_layers)
class_ids = [] confidences = [] boxes = [] for out in outs: for detection in out: scores = detection[5:] class_id = np.argmax(scores) confidence = scores[class_id] if confidence > 0.5: # Object detected center_x = int(detection[0] * width) center_y = int(detection[1] * height) w = int(detection[2] * width) h = int(detection[3] * height) # Rectangle coordinates x = int(center_x - w / 2) y = int(center_y - h / 2) boxes.append([x, y, w, h]) confidences.append(float(confidence)) class_ids.append(class_id) # Apply Non-Maximum Suppression indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4) # Draw bounding boxes for i in range(len(boxes)): if i in indexes: x, y, w, h = boxes[i] label = str(class_ids[i]) cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2) cv2.putText(image, label, (x, y + 30), cv2.FONT_HERSHEY_PLAIN, 3, (0, 255, 0), 3) cv2.imshow('Object Detection', image) cv2.waitKey(0) cv2.destroyAllWindows()