Yumi's Blog

Part 2 Object Detection using RCNN on Pascal VOC2012 - R-CNN overview

Screen Shot 2018-11-18 at 4.58.16 PM Cited from Rich feature hierarchies for accurate object detection and semantic segmentation paper

This is the second blog post of "Object Detection with R-CNN" series.

In this blog, I will review Rich feature hierarchies for accurate object detection and semantic segmentation paper to understand Regions with CNN features (R-CNN) method. R-CNN is a successful object detection algorithm that can return class label of objects and their bounding boxes for a given image. The work is published in 2013 and there have been many faster algorithms for the object detection algorithm (e.g., fast R-CNN, faster R-CNN and Yolo). But nevertheless, the implementation of the R-CNN is simple, and serves as a powerful bench mark for various object detection tasks. So for that reason, this blog will review the R-CNN algorithm.

This blog only goes over its concepts and the actual implementations are discussed in Part 3, Part 4 and Part 5 of the "Object Detection with R-CNN" series.

Reference

Reference: "Object Detection with R-CNN" series in my blog

Reference: "Object Detection with R-CNN" series in my Github

Idea of R-CNN

Screen Shot 2018-11-18 at 5.21.32 PM The image cited from Data Camp

The clever idea of the R-CNN lies in generalizing or "transfering" the CNN classification results on ImageNet to object detection on the PASCAL VOC challenge.

The CNN has been successful in image classification competitions such as ImageNet. The goal of image classifications is to identify the class label of the image. Here, a single image is assumed to have a single label.

Unlike image classification, object detection requires localizing and identifying many objects within an image. Here, a single image may contain multiple class objects. So in some sense, object detection is more complex task than the object detection.

In short summary, the R-CNN generates around 2,000 category-independent region proposals for the input image, extracts a fixed length feature vector from each proposal using a CNN, and then classifies each region with category-specific linear SVMs. This process is summarized in the Figure 1 of Rich feature hierarchies for accurate object detection and semantic segmentation shown below. Screen Shot 2018-11-18 at 4.58.16 PM Cited from Rich feature hierarchies for accurate object detection and semantic segmentation paper

R-CNN

R-CNN algorithm consists of three steps (1) Generate region proposals, (2) Create CNN features and (3) Classify each region into classes by SVM.

Step 1: Generate region proposals.

The first step is to generate category-independent region proposals. These proposals define the set of candidate object regions. Here, R-CNN authors suggest Selective Search for region proposal algorithm.

This step is discussed in details with implementation at Part 3: Object Detection with Pascal VOC2012 - Selective Search.

Felzenszwalb’s efficient graph based segmentation algorithm to create initial regions

Merge initial regions using diverse criteria create region proposals

Step 2: Create CNN features

Screen Shot 2018-11-23 at 2.43.29 PM Cited from VGG in TensorFlow.

The second step is to use affine image warping to createa fixed-size CNN input from each candidate region from Step 1, and apply a large pre-trained large CNN network that extracts a fixed length feature vector.

This step is discussed in details with implementation at Part 4: Object Detection with Pascal VOC2012 - CNN feature extraction.

Step 3: Classify each region into classes by SVM

The third step is to apply a set of class-specific linear Suppor Vector Machines (SVMs) to classify each region into class.

This step is discussed in details with implementation also at Part 4: Object Detection with Pascal VOC2012 - CNN feature extraction.

Comments