Article Categories

Selected Reading

Building a Real-Time Object Detection System with YOLO Algorithm

Python Server Side Programming Programming

Real-time object detection has become a cornerstone of modern computer vision applications. The YOLO (You Only Look Once) algorithm revolutionized this field by performing object detection in a single forward pass, making it ideal for real-time applications like autonomous vehicles and surveillance systems.

YOLO treats object detection as a regression problem, dividing input images into grids and predicting bounding boxes with class probabilities directly. This unified approach achieves impressive speed while maintaining accuracy.

Prerequisites and Setup

Before building our detection system, we need to install the required libraries. OpenCV provides essential computer vision tools, while we'll use pre-trained YOLO weights for immediate implementation.

Installing Required Libraries

pip install opencv-python numpy

Downloading YOLO Files

You'll need three files for YOLO to work:

yolov3.weights ? Pre-trained model weights
yolov3.cfg ? Network configuration file
coco.names ? Class names for COCO dataset (80 classes)

Understanding the Detection Pipeline

The YOLO detection process involves several key steps:

Image Preprocessing ? Convert input to blob format with specific dimensions
Forward Pass ? Run the image through the neural network
Post-processing ? Filter detections and apply non-maximum suppression
Visualization ? Draw bounding boxes and labels on detected objects

Complete Implementation

Real-Time Object Detection System

Here's a complete implementation that captures video from your webcam and performs real-time object detection ?

import cv2
import numpy as np

# Load YOLO model
net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg")

# Load class names
classes = []
with open("coco.names", "r") as f:
    classes = [line.strip() for line in f.readlines()]

# Get output layer names
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]

# Generate random colors for each class
colors = np.random.uniform(0, 255, size=(len(classes), 3))

# Initialize video capture
cap = cv2.VideoCapture(0)

while True:
    # Read frame from camera
    ret, frame = cap.read()
    if not ret:
        break
    
    height, width, channels = frame.shape
    
    # Prepare input blob
    blob = cv2.dnn.blobFromImage(frame, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
    net.setInput(blob)
    
    # Run forward pass
    outs = net.forward(output_layers)
    
    # Initialize lists for detections
    class_ids = []
    confidences = []
    boxes = []
    
    # Process each output
    for out in outs:
        for detection in out:
            scores = detection[5:]
            class_id = np.argmax(scores)
            confidence = scores[class_id]
            
            if confidence > 0.5:
                # Calculate bounding box coordinates
                center_x = int(detection[0] * width)
                center_y = int(detection[1] * height)
                w = int(detection[2] * width)
                h = int(detection[3] * height)
                
                # Rectangle coordinates
                x = int(center_x - w / 2)
                y = int(center_y - h / 2)
                
                boxes.append([x, y, w, h])
                confidences.append(float(confidence))
                class_ids.append(class_id)
    
    # Apply non-maximum suppression
    indices = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)
    
    # Draw bounding boxes and labels
    if len(indices) > 0:
        for i in indices.flatten():
            x, y, w, h = boxes[i]
            label = str(classes[class_ids[i]])
            confidence = confidences[i]
            color = colors[class_ids[i]]
            
            # Draw bounding box
            cv2.rectangle(frame, (x, y), (x + w, y + h), color, 2)
            
            # Draw label
            cv2.putText(frame, f"{label} {confidence:.2f}", 
                       (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
    
    # Display frame
    cv2.imshow("YOLO Real-time Detection", frame)
    
    # Break on 'q' key press
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Clean up
cap.release()
cv2.destroyAllWindows()

Key Parameters Explained

Parameter	Value	Purpose
Input Size	416×416	YOLO model input dimensions
Scale Factor	0.00392	Normalizes pixel values (1/255)
Confidence Threshold	0.5	Minimum detection confidence
NMS Threshold	0.4	Non-maximum suppression threshold

Performance Optimization Tips

GPU Acceleration ? Use net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA) for GPU support
Input Resolution ? Lower resolutions (320×320) increase speed but reduce accuracy
Confidence Threshold ? Higher thresholds reduce false positives but may miss objects

Common Applications

Security Systems ? Detect intruders or suspicious activities
Traffic Monitoring ? Count vehicles and analyze traffic patterns
Retail Analytics ? Track customer behavior and product interactions
Sports Analysis ? Track players and analyze game dynamics

Conclusion

YOLO provides an excellent foundation for real-time object detection systems. The single-pass architecture ensures fast inference while maintaining good accuracy across 80 object classes. With proper setup and parameter tuning, you can deploy robust detection systems for various applications requiring real-time performance.

S Vijay Balaji

Updated on: 2026-03-27T14:17:32+05:30

1K+ Views

Previous Next