Computer Vision for Real-World Apps
Build practical computer vision applications using modern deep learning techniques.
Introduction
Computer Vision (CV) enables computers to "see" and interpret the visual world. From self-driving cars to medical imaging, CV is transforming industries.
In this tutorial, we will build a real-time object detection system using YOLO (You Only Look Once) and OpenCV.
Prerequisites
- Python 3.8+
- Basic understanding of Convolutional Neural Networks (CNNs)
Setting Up
We'll use the ultralytics library for YOLOv8 and opencv-python for image processing.
pip install ultralytics opencv-python
Object Detection with YOLOv8
YOLO is famous for its speed and accuracy. It treats object detection as a single regression problem, straight from image pixels to bounding box coordinates and class probabilities.
from ultralytics import YOLO
import cv2
# Load a pretrained model
model = YOLO('yolov8n.pt') # 'n' for nano, smallest and fastest
# Run inference on an image
results = model('https://ultralytics.com/images/bus.jpg') # predict on an image
# Show the results
for r in results:
im_array = r.plot() # plot a BGR numpy array of predictions
im = Image.fromarray(im_array[..., ::-1]) # RGB PIL image
im.show() # show image
Real-Time Detection on Webcam
Now let's hook this up to a webcam feed.
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
if not ret:
break
# Run inference
results = model(frame, stream=True)
# Visualize results
for r in results:
boxes = r.boxes
for box in boxes:
# Bounding Box
x1, y1, x2, y2 = box.xyxy[0]
x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
# Class Name
cls = int(box.cls[0])
name = model.names[cls]
# Draw
cv2.rectangle(frame, (x1, y1), (x2, y2), (255, 0, 255), 3)
cv2.putText(frame, name, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (255, 0, 255), 2)
cv2.imshow('Webcam', frame)
if cv2.waitKey(1) == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
Advanced: Custom Training
To detect custom objects (e.g., detecting defects on a manufacturing line), you need to train YOLO on your own dataset.
- Collect images.
- Annotate them using tools like LabelImg or Roboflow.
- Train the model:
model.train(data='custom_dataset.yaml', epochs=100)
Conclusion
Computer vision is more accessible than ever. With pretrained models like YOLO, you can build powerful applications with just a few lines of code.