Yumi's Blog

Part 7 Object Detection with YOLOv2 using VOC 2012 data - inference on video

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import os, sys
print(sys.version)
%matplotlib inline
3.6.3 |Anaconda, Inc.| (default, Oct  6 2017, 12:04:38) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]

Read in the hyperparameters to define the YOLOv2 model used during training

In [2]:
train_image_folder = "../ObjectDetectionRCNN/VOCdevkit/VOC2012/JPEGImages/"
train_annot_folder = "../ObjectDetectionRCNN/VOCdevkit/VOC2012/Annotations/"

LABELS = ['aeroplane',  'bicycle', 'bird',  'boat',      'bottle', 
          'bus',        'car',      'cat',  'chair',     'cow',
          'diningtable','dog',    'horse',  'motorbike', 'person',
          'pottedplant','sheep',  'sofa',   'train',   'tvmonitor']

ANCHORS = np.array([1.07709888,  1.78171903,  # anchor box 1, width , height
                    2.71054693,  5.12469308,  # anchor box 2, width,  height
                   10.47181473, 10.09646365,  # anchor box 3, width,  height
                    5.48531347,  8.11011331]) # anchor box 4, width,  height


BOX               = int(len(ANCHORS)/2)
TRUE_BOX_BUFFER   = 50
IMAGE_H, IMAGE_W  = 416, 416
GRID_H,  GRID_W   = 13 , 13

Define model

Load the weights trained in Part 5

In [3]:
from backend import define_YOLOv2

CLASS             = len(LABELS)
model, _          = define_YOLOv2(IMAGE_H,IMAGE_W,GRID_H,GRID_W,TRUE_BOX_BUFFER,BOX,CLASS, 
                                  trainable=False)
model.load_weights("weights_yolo_on_voc2012.h5")
/Users/yumikondo/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.

Read in the mp4 video

In [4]:
import cv2
video_inp = "beyonce.mp4"
video_out = "beyonce_yolo.mp4"

video_reader = cv2.VideoCapture(video_inp)

nb_frames = int(video_reader.get(cv2.CAP_PROP_FRAME_COUNT))
frame_h   = int(video_reader.get(cv2.CAP_PROP_FRAME_HEIGHT))
frame_w   = int(video_reader.get(cv2.CAP_PROP_FRAME_WIDTH))
print(nb_frames,frame_h,frame_w)
8024 360 480
In [5]:
from backend import ImageReader # from part 2 blog
count    = 0
min_count = 0#1000
max_count = draw2200
X_test   = []
while count < max_count:
    count += 1
    ret, _image = video_reader.read()
    if (count < min_count):
        continue
        
    if count % 100 == 0:
        print(" {}/{}".format(count,nb_frames))        
    imageReader = ImageReader(IMAGE_H,
                              IMAGE_W = IMAGE_W, 
                              norm    = lambda image : image / 255.)
    _image      = imageReader.encode_core(_image)
    X_test.append(_image)
    
X_test = np.array(X_test)

video_reader.release()  
 1000/8024
 1100/8024
 1200/8024
 1300/8024
 1400/8024
 1500/8024
 1600/8024
 1700/8024
 1800/8024
 1900/8024
 2000/8024
 2100/8024
 2200/8024

For each video frame, detect objects with YOLO

In [6]:
X_test = np.array(X_test)
## model
dummy_array    = np.zeros((len(X_test),1,1,1,TRUE_BOX_BUFFER,4))
y_pred         = model.predict([X_test,dummy_array])

Create video writer

In [7]:
from backend import OutputRescaler, find_high_class_probability_bbox, draw_boxes,nonmax_suppression
obj_threshold   = 0.03
dir_png         = "pngfolder"
outputRescaler  = OutputRescaler(ANCHORS=ANCHORS)
#video_writer   = cv2.VideoWriter(video_out,
#                                 cv2.VideoWriter_fourcc(*'mp4v'), # be sure to use lower case
#                                 20.0, 
#                                 (frame_w, frame_h))

for iframe in range(len(y_pred)):
        netout       = y_pred[iframe] 
        image        = X_test[iframe]
        # decoding YOLO output
        netout_scale = outputRescaler.fit(netout)
        boxes        = find_high_class_probability_bbox(netout_scale,obj_threshold)
        if len(boxes) > 0:
            final_boxes = nonmax_suppression(boxes,
                                             iou_threshold = 0.3,
                                             obj_threshold = obj_threshold)
            if len(final_boxes) > 0: 
                image = draw_boxes(image,final_boxes,LABELS)
        #video_writer.write(np.uint8(image))
        plt.figure(figsize=(20,20))
        plt.subplots_adjust(hspace=0.02,wspace=0.01, left=0,right=1,bottom=0, top=1) 
        plt.imshow(image)
        plt.savefig(dir_png + "/fig_{:04.0f}.png".format(iframe),bbox_inches='tight',pad_inches=0)
        plt.close()
#video_writer.release()

Use ffmpeg to convert pngs to the mp4 video

If you do not have ffmpeg, follow this tutorial to install it ffmpeg installation.

Following the suggestion in stackoverflow From the terminal run:

 ffmpeg -pattern_type glob -i "fig_*.png" -vcodec libx264 -s 640x480 -pix_fmt yuv420p movie.mp4   

FairyOnIce/ObjectDetectionYolo contains this ipython notebook and all the functions that I defined in this notebook.

Comments