This is the seventh and final blog post of Object Detection with YOLO blog series. This blog performs inference using the model in trained in Part 5 Object Detection with Yolo using VOC 2012 data - training. I will use PASCAL VOC2012 data. This blog assumes that the readers have read the previous blog posts - Part 1, Part 2, Part 3, Part 4, Part 5, Part 6
Andrew Ng's YOLO lecture¶
- Neural Networks - Bounding Box Predictions
- C4W3L06 Intersection Over Union
- C4W3L07 Nonmax Suppression
- C4W3L08 Anchor Boxes
- C4W3L09 YOLO Algorithm
Reference¶
Reference in my blog¶
- Part 1 Object Detection using YOLOv2 on Pascal VOC2012 - anchor box clustering
- Part 2 Object Detection using YOLOv2 on Pascal VOC2012 - input and output encoding
- Part 3 Object Detection using YOLOv2 on Pascal VOC2012 - model
- Part 4 Object Detection using YOLOv2 on Pascal VOC2012 - loss
- Part 5 Object Detection using YOLOv2 on Pascal VOC2012 - training
- Part 6 Object Detection using YOLOv2 on Pascal VOC 2012 data - inference on image
- Part 7 Object Detection using YOLOv2 on Pascal VOC 2012 data - inference on video
My GitHub repository¶
This repository contains all the ipython notebooks in this blog series and the funcitons (See backend.py).
import matplotlib.pyplot as plt
import numpy as np
import os, sys
print(sys.version)
%matplotlib inline
Read in the hyperparameters to define the YOLOv2 model used during training
train_image_folder = "../ObjectDetectionRCNN/VOCdevkit/VOC2012/JPEGImages/"
train_annot_folder = "../ObjectDetectionRCNN/VOCdevkit/VOC2012/Annotations/"
LABELS = ['aeroplane', 'bicycle', 'bird', 'boat', 'bottle',
'bus', 'car', 'cat', 'chair', 'cow',
'diningtable','dog', 'horse', 'motorbike', 'person',
'pottedplant','sheep', 'sofa', 'train', 'tvmonitor']
ANCHORS = np.array([1.07709888, 1.78171903, # anchor box 1, width , height
2.71054693, 5.12469308, # anchor box 2, width, height
10.47181473, 10.09646365, # anchor box 3, width, height
5.48531347, 8.11011331]) # anchor box 4, width, height
BOX = int(len(ANCHORS)/2)
TRUE_BOX_BUFFER = 50
IMAGE_H, IMAGE_W = 416, 416
GRID_H, GRID_W = 13 , 13
from backend import define_YOLOv2
CLASS = len(LABELS)
model, _ = define_YOLOv2(IMAGE_H,IMAGE_W,GRID_H,GRID_W,TRUE_BOX_BUFFER,BOX,CLASS,
trainable=False)
model.load_weights("weights_yolo_on_voc2012.h5")
Read in the mp4 video¶
import cv2
video_inp = "beyonce.mp4"
video_out = "beyonce_yolo.mp4"
video_reader = cv2.VideoCapture(video_inp)
nb_frames = int(video_reader.get(cv2.CAP_PROP_FRAME_COUNT))
frame_h = int(video_reader.get(cv2.CAP_PROP_FRAME_HEIGHT))
frame_w = int(video_reader.get(cv2.CAP_PROP_FRAME_WIDTH))
print(nb_frames,frame_h,frame_w)
from backend import ImageReader # from part 2 blog
count = 0
min_count = 0#1000
max_count = draw2200
X_test = []
while count < max_count:
count += 1
ret, _image = video_reader.read()
if (count < min_count):
continue
if count % 100 == 0:
print(" {}/{}".format(count,nb_frames))
imageReader = ImageReader(IMAGE_H,
IMAGE_W = IMAGE_W,
norm = lambda image : image / 255.)
_image = imageReader.encode_core(_image)
X_test.append(_image)
X_test = np.array(X_test)
video_reader.release()
For each video frame, detect objects with YOLO¶
X_test = np.array(X_test)
## model
dummy_array = np.zeros((len(X_test),1,1,1,TRUE_BOX_BUFFER,4))
y_pred = model.predict([X_test,dummy_array])
Create video writer¶
from backend import OutputRescaler, find_high_class_probability_bbox, draw_boxes,nonmax_suppression
obj_threshold = 0.03
dir_png = "pngfolder"
outputRescaler = OutputRescaler(ANCHORS=ANCHORS)
#video_writer = cv2.VideoWriter(video_out,
# cv2.VideoWriter_fourcc(*'mp4v'), # be sure to use lower case
# 20.0,
# (frame_w, frame_h))
for iframe in range(len(y_pred)):
netout = y_pred[iframe]
image = X_test[iframe]
# decoding YOLO output
netout_scale = outputRescaler.fit(netout)
boxes = find_high_class_probability_bbox(netout_scale,obj_threshold)
if len(boxes) > 0:
final_boxes = nonmax_suppression(boxes,
iou_threshold = 0.3,
obj_threshold = obj_threshold)
if len(final_boxes) > 0:
image = draw_boxes(image,final_boxes,LABELS)
#video_writer.write(np.uint8(image))
plt.figure(figsize=(20,20))
plt.subplots_adjust(hspace=0.02,wspace=0.01, left=0,right=1,bottom=0, top=1)
plt.imshow(image)
plt.savefig(dir_png + "/fig_{:04.0f}.png".format(iframe),bbox_inches='tight',pad_inches=0)
plt.close()
#video_writer.release()
Use ffmpeg to convert pngs to the mp4 video¶
If you do not have ffmpeg, follow this tutorial to install it ffmpeg installation.
Following the suggestion in stackoverflow From the terminal run:
ffmpeg -pattern_type glob -i "fig_*.png" -vcodec libx264 -s 640x480 -pix_fmt yuv420p movie.mp4
FairyOnIce/ObjectDetectionYolo contains this ipython notebook and all the functions that I defined in this notebook.