Building a Real-Time Object Detection System with YOLO Algorithm

Python Server Side Programming Programming

In recent years, the field of computer vision has witnessed remarkable advancements, with real-time object detection being one of the most exciting and impactful areas. Real-time object detection refers to the ability to detect and identify objects in images or videos in real-time, enabling a wide range of applications such as autonomous vehicles, surveillance systems, augmented reality, and more. In this tutorial, we will explore how to build a real-time object detection system using Python and the YOLO (You Only Look Once) algorithm.

The YOLO algorithm revolutionized object detection by introducing a single, unified approach that performs both object localization and classification in a single pass. Unlike traditional methods that use complex pipelines involving multiple stages, YOLO algorithm achieves impressive speed and accuracy by treating object detection as a regression problem. It divides the input image into a grid and predicts bounding boxes and class probabilities directly from the grid cells.

Python, with its simplicity, versatility, and rich ecosystem of libraries, is an excellent choice for implementing real-time object detection systems. We will be using the Darknet framework, which is an open-source neural network framework written in C and CUDA, to train our model using the YOLO algorithm. With the help of the Darknet framework and Python, we will build a real-time object detection system that can detect and classify objects from live video streams or recorded videos.

Getting Started

To start building our real-time object detection system with Python and the YOLO algorithm, we need to set up our development environment and install the necessary libraries. The following steps will guide you through the installation process −

Step 1: Install OpenCV

OpenCV is a popular computer vision library that provides essential tools and functions for image and video processing. We can install OpenCV using pip, the Python package manager, by running the following command in the terminal −

pip install opencv-python

Step 2: Install Darknet

Darknet is the framework we will use to train our YOLO model. To install Darknet, open a terminal window and follow these steps −

Clone the Darknet Repository From GitHub

git clone https://github.com/AlexeyAB/darknet.git

Change Into the Darknet Directory

cd darknet

Build Darknet

make

This step may take some time as it compiles the C code and builds the Darknet framework. Once the build process is complete, you should have the Darknet executable ready for use.

Building a Real-Time Object Detection System with YOLO

Now that we have our development environment set up and the necessary libraries installed, we can proceed with building our real-time object detection system. I have broken down all the different steps involved in object detection followed by the complete code for better understanding of the entire pipeline and process. This will prevent confusion in dealing with smaller pieces of code.

The main steps involved in building the system are as follows −

Preparing the Dataset − To train our YOLO model, we need a labeled dataset containing images and corresponding annotations. The dataset should consist of images with labeled bounding boxes around the objects we want to detect. The annotations typically include the class label and the coordinates of the bounding box.
Configuring the YOLO Model − The YOLO algorithm has different variations, such as YOLOv1, YOLOv2, YOLOv3, and YOLOv4. Each version has its own configuration file specifying the network architecture, hyperparameters, and training settings. We need to choose a suitable YOLO version and configure it based on our requirements.
Training the YOLO Model − With the dataset and configuration in place, we can start training our YOLO model using the Darknet framework. Training involves feeding the labeled images to the model, optimizing the network's weights using backpropagation, and adjusting the parameters to minimize the detection errors.
Testing and Evaluation − Once the model is trained, we can evaluate its performance by testing it on a separate set of images or videos. We measure metrics such as precision, recall, and mean average precision (mAP) to assess the accuracy and reliability of our object detection system.
Real-time Object Detection − After successfully training and evaluating the model, we can integrate it with a live video stream or recorded videos to perform real-time object detection. We will use OpenCV to capture video frames, apply the YOLO algorithm for object detection, and display the results in real-time.

Let's now dive into the code implementation of each step in building our real-time object detection system.

Complete Code

Example

Here is the complete code −

import cv2

# Load YOLO weights and configuration
net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg")
classes = []
with open("coco.names", "r") as f:
   classes = [line.strip() for line in f.readlines()]

# Set up output layers
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]

# Load video stream
cap = cv2.VideoCapture(0)

while True:
   # Read frames from the video stream
   ret, frame = cap.read()
   if not ret:
      break

   # Preprocess frame for object detection
   blob = cv2.dnn.blobFromImage(frame, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
   net.setInput(blob)
   outs = net.forward(output_layers)

   # Process the outputs
   class_ids = []
   confidences = []
   boxes = []
   for out in outs:
      for detection in out:
         scores = detection[5:]
         class_id = np.argmax(scores)
         confidence = scores[class_id]
         if confidence > 0.5:
            # Object detected
            center_x = int(detection[0] * frame.shape[1])
            center_y = int(detection[1] * frame.shape[0])
            width = int(detection[2] * frame.shape[1])
            height = int(detection[3] * frame.shape[0])
            x = int(center_x - width / 2)
            y = int(center_y - height / 2)

            boxes.append([x, y, width, height])
            confidences.append(float(confidence))
            class_ids.append(class_id)

   # Apply non-maximum suppression to remove overlapping detections
   indices = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)
   # Draw bounding boxes and labels on the frame
   font = cv2.FONT_HERSHEY_PLAIN
   colors = np.random.uniform(0, 255, size=(len(classes), 3))
   if len(indices) > 0:
      for i in indices.flatten():
         x, y, w, h = boxes[i]
         label = str(classes[class_ids[i]])
         confidence = confidences[i]
         color = colors[i]
         cv2.rectangle(frame, (x, y), (x + w, y + h), color, 2)
         cv2.putText(frame, f"{label} {confidence:.2f}", (x, y - 5), font, 1, color, 2)

   # Display the resulting frame
   cv2.imshow("Real-time Object Detection", frame)
   if cv2.waitKey(1) == ord("q"):
      break

# Release resources
cap.release()
cv2.destroyAllWindows()

Conclusion

In this tutorial, we have explored how to build a real-time object detection system using Python and the YOLO algorithm. We began by introducing the concept of real-time object detection and the significance of the YOLO algorithm in the field of computer vision. We then covered the installation of the necessary libraries, including Python, OpenCV, and the Darknet framework.

Throughout the main content, we discussed the essential steps involved in building a real-time object detection system, such as preparing the dataset, configuring the YOLO model, training the model, and testing and evaluating its performance. We also provided a complete code example that demonstrated the real-time object detection process using Python, OpenCV, and the YOLO algorithm.

By following the steps outlined in this tutorial, you can create your own real-time object detection system that can detect and classify objects in live video streams or recorded videos. This opens up possibilities for a wide range of applications, including surveillance systems, autonomous vehicles, and augmented reality experiences.

Object detection is an exciting and rapidly evolving field, and the YOLO algorithm is just one of the many techniques available. As you further explore the world of computer vision, consider experimenting with other algorithms, datasets, and training strategies to enhance the accuracy and performance of your object detection systems.

S Vijay Balaji

Updated on: 31-Aug-2023

231 Views

Kickstart Your Career

Get certified by completing the course

Get Started