This is the seventh and final blog post of Object Detection with YOLO blog series. This blog performs inference using the model in trained in Part 5 Object Detection with Yolo using VOC 2012 data - training. I will use PASCAL VOC2012 data. This blog assumes that the readers have read the previous blog posts - Part 1, Part 2, Part 3, Part 4, Part 5, Part 6

Andrew Ng's YOLO lecture¶

Reference¶

Reference in my blog¶

My GitHub repository¶

This repository contains all the ipython notebooks in this blog series and the funcitons (See backend.py).

FairyOnIce/ObjectDetectionYolo

In [1]:

import matplotlib.pyplot as plt
import numpy as np
import os, sys
print(sys.version)
%matplotlib inline

3.6.3 |Anaconda, Inc.| (default, Oct  6 2017, 12:04:38) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]

Read in the hyperparameters to define the YOLOv2 model used during training

In [2]:

train_image_folder = "../ObjectDetectionRCNN/VOCdevkit/VOC2012/JPEGImages/"
train_annot_folder = "../ObjectDetectionRCNN/VOCdevkit/VOC2012/Annotations/"

LABELS = ['aeroplane',  'bicycle', 'bird',  'boat',      'bottle', 
          'bus',        'car',      'cat',  'chair',     'cow',
          'diningtable','dog',    'horse',  'motorbike', 'person',
          'pottedplant','sheep',  'sofa',   'train',   'tvmonitor']

ANCHORS = np.array([1.07709888,  1.78171903,  # anchor box 1, width , height
                    2.71054693,  5.12469308,  # anchor box 2, width,  height
                   10.47181473, 10.09646365,  # anchor box 3, width,  height
                    5.48531347,  8.11011331]) # anchor box 4, width,  height


BOX               = int(len(ANCHORS)/2)
TRUE_BOX_BUFFER   = 50
IMAGE_H, IMAGE_W  = 416, 416
GRID_H,  GRID_W   = 13 , 13

Define model¶

Load the weights trained in Part 5

In [3]:

from backend import define_YOLOv2

CLASS             = len(LABELS)
model, _          = define_YOLOv2(IMAGE_H,IMAGE_W,GRID_H,GRID_W,TRUE_BOX_BUFFER,BOX,CLASS, 
                                  trainable=False)
model.load_weights("weights_yolo_on_voc2012.h5")

/Users/yumikondo/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.

Read in the mp4 video¶

In [4]:

import cv2
video_inp = "beyonce.mp4"
video_out = "beyonce_yolo.mp4"

video_reader = cv2.VideoCapture(video_inp)

nb_frames = int(video_reader.get(cv2.CAP_PROP_FRAME_COUNT))
frame_h   = int(video_reader.get(cv2.CAP_PROP_FRAME_HEIGHT))
frame_w   = int(video_reader.get(cv2.CAP_PROP_FRAME_WIDTH))
print(nb_frames,frame_h,frame_w)

8024 360 480

In [5]:

from backend import ImageReader # from part 2 blog
count    = 0
min_count = 0#1000
max_count = draw2200
X_test   = []
while count < max_count:
    count += 1
    ret, _image = video_reader.read()
    if (count < min_count):
        continue
        
    if count % 100 == 0:
        print(" {}/{}".format(count,nb_frames))        
    imageReader = ImageReader(IMAGE_H,
                              IMAGE_W = IMAGE_W, 
                              norm    = lambda image : image / 255.)
    _image      = imageReader.encode_core(_image)
    X_test.append(_image)
    
X_test = np.array(X_test)

video_reader.release()

For each video frame, detect objects with YOLO¶

In [6]:

X_test = np.array(X_test)
## model
dummy_array    = np.zeros((len(X_test),1,1,1,TRUE_BOX_BUFFER,4))
y_pred         = model.predict([X_test,dummy_array])

Create video writer¶

In [7]:

from backend import OutputRescaler, find_high_class_probability_bbox, draw_boxes,nonmax_suppression
obj_threshold   = 0.03
dir_png         = "pngfolder"
outputRescaler  = OutputRescaler(ANCHORS=ANCHORS)
#video_writer   = cv2.VideoWriter(video_out,
#                                 cv2.VideoWriter_fourcc(*'mp4v'), # be sure to use lower case
#                                 20.0, 
#                                 (frame_w, frame_h))

for iframe in range(len(y_pred)):
        netout       = y_pred[iframe] 
        image        = X_test[iframe]
        # decoding YOLO output
        netout_scale = outputRescaler.fit(netout)
        boxes        = find_high_class_probability_bbox(netout_scale,obj_threshold)
        if len(boxes) > 0:
            final_boxes = nonmax_suppression(boxes,
                                             iou_threshold = 0.3,
                                             obj_threshold = obj_threshold)
            if len(final_boxes) > 0: 
                image = draw_boxes(image,final_boxes,LABELS)
        #video_writer.write(np.uint8(image))
        plt.figure(figsize=(20,20))
        plt.subplots_adjust(hspace=0.02,wspace=0.01, left=0,right=1,bottom=0, top=1) 
        plt.imshow(image)
        plt.savefig(dir_png + "/fig_{:04.0f}.png".format(iframe),bbox_inches='tight',pad_inches=0)
        plt.close()
#video_writer.release()

Use ffmpeg to convert pngs to the mp4 video¶

If you do not have ffmpeg, follow this tutorial to install it ffmpeg installation.

Following the suggestion in stackoverflow From the terminal run:

 ffmpeg -pattern_type glob -i "fig_*.png" -vcodec libx264 -s 640x480 -pix_fmt yuv420p movie.mp4

FairyOnIce/ObjectDetectionYolo contains this ipython notebook and all the functions that I defined in this notebook.

Yumi's Blog

Part 7 Object Detection with YOLOv2 using VOC 2012 data - inference on video