Yumi's Blog

Part 5 Object Detection using RCNN on Pascal VOC2012 - inference

bbx_example

This is the last article of the blog series for Object Detection with R-CNN.

If you are reading this blog, congratulations for getting this far. Now you are ready to experiment the performance of your RCNN classifier. I will use my own image to see whether the classifier can detect my face.

Reference: "Object Detection with R-CNN" series in my blog

Reference: "Object Detection with R-CNN" series in my Github

Read in an image

Here, I read-in my picture taken in gloomy San Fransisco. Let's see whether the RCNN can find me (and possibly the person behind!) The images are saved in image folder under the current working directory. This image can be downloaded at example_image_easy.JPG

In [1]:
import matplotlib.pyplot as plt
import imageio, os 
import skimage.transform
import numpy as np

dir_image = "image"
img = imageio.imread(os.path.join(dir_image,"example_image_easy.JPG"))
## resize the image because the original image is a bit too large and takes lots of time for computation
# I used this resizing hack to train the classifier and also to extract candidate regions
newsize = (200,250)
img = skimage.transform.resize(img,newsize)
const = 4
plt.figure(figsize=(5*const,6*const))
plt.imshow(img)
plt.show()
/Users/yumikondo/anaconda3/lib/python3.6/site-packages/skimage/transform/_warps.py:84: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15.
  warn("The default mode, 'constant', will be changed to 'reflect' in "

Get region proposals for this image

The codes of selective search is discussed in - Part 3: Object Detection with Pascal VOC2012 - Selective Search. And its functions are saved at my github.

In [2]:
import selective_search as ss
regions = ss.get_region_proposal(img,min_size=50)
print("N candidate regions ={}".format(len(regions)))
print("_"*10)
print("print the first 10 regions")
for r in regions[:10]:
    print(r)
print("_"*10)
print("print the last 10 regions")    
for r in regions[-10:]:
    print(r)
N candidate regions =528
print the first 10 regions and the last 10 regions
{'rect': (0, 0, 27, 15), 'size': 195, 'labels': [0.0]}
{'rect': (8, 0, 28, 9), 'size': 108, 'labels': [1.0]}
{'rect': (28, 0, 25, 8), 'size': 135, 'labels': [2.0]}
{'rect': (45, 0, 21, 4), 'size': 93, 'labels': [3.0]}
{'rect': (63, 0, 35, 16), 'size': 209, 'labels': [4.0]}
{'rect': (76, 0, 13, 10), 'size': 78, 'labels': [5.0]}
{'rect': (84, 0, 29, 10), 'size': 213, 'labels': [6.0]}
{'rect': (95, 0, 39, 22), 'size': 333, 'labels': [7.0]}
{'rect': (110, 0, 24, 10), 'size': 148, 'labels': [8.0]}
{'rect': (124, 0, 24, 6), 'size': 145, 'labels': [9.0]}
{'rect': (85, 104, 127, 95), 'size': 5040, 'labels': [241.0, 260.0, 233.0, 265.0, 248.0, 250.0, 234.0, 245.0, 223.0, 175.0, 160.0, 237.0, 153.0, 251.0, 262.0, 242.0, 243.0, 189.0]}
{'rect': (85, 78, 127, 121), 'size': 6699, 'labels': [241.0, 260.0, 233.0, 265.0, 248.0, 250.0, 234.0, 245.0, 223.0, 175.0, 160.0, 237.0, 153.0, 251.0, 262.0, 242.0, 243.0, 189.0, 117.0, 121.0, 131.0, 119.0, 122.0, 118.0, 128.0, 140.0, 148.0]}
{'rect': (85, 78, 164, 121), 'size': 8370, 'labels': [241.0, 260.0, 233.0, 265.0, 248.0, 250.0, 234.0, 245.0, 223.0, 175.0, 160.0, 237.0, 153.0, 251.0, 262.0, 242.0, 243.0, 189.0, 117.0, 121.0, 131.0, 119.0, 122.0, 118.0, 128.0, 140.0, 148.0, 244.0, 254.0, 257.0, 232.0, 263.0, 266.0]}
{'rect': (85, 78, 164, 121), 'size': 11241, 'labels': [241.0, 260.0, 233.0, 265.0, 248.0, 250.0, 234.0, 245.0, 223.0, 175.0, 160.0, 237.0, 153.0, 251.0, 262.0, 242.0, 243.0, 189.0, 117.0, 121.0, 131.0, 119.0, 122.0, 118.0, 128.0, 140.0, 148.0, 244.0, 254.0, 257.0, 232.0, 263.0, 266.0, 173.0, 197.0, 174.0, 165.0, 180.0, 200.0, 204.0, 201.0, 163.0, 190.0, 222.0, 225.0, 210.0, 218.0]}
{'rect': (0, 0, 249, 199), 'size': 46411, 'labels': [241.0, 260.0, 233.0, 265.0, 248.0, 250.0, 234.0, 245.0, 223.0, 175.0, 160.0, 237.0, 153.0, 251.0, 262.0, 242.0, 243.0, 189.0, 117.0, 121.0, 131.0, 119.0, 122.0, 118.0, 128.0, 140.0, 148.0, 244.0, 254.0, 257.0, 232.0, 263.0, 266.0, 173.0, 197.0, 174.0, 165.0, 180.0, 200.0, 204.0, 201.0, 163.0, 190.0, 222.0, 225.0, 210.0, 218.0, 198.0, 202.0, 191.0, 194.0, 147.0, 156.0, 167.0, 170.0, 135.0, 137.0, 143.0, 157.0, 134.0, 150.0, 172.0, 142.0, 152.0, 162.0, 166.0, 178.0, 183.0, 187.0, 208.0, 215.0, 206.0, 192.0, 177.0, 182.0, 195.0, 188.0, 186.0, 181.0, 169.0, 185.0, 213.0, 139.0, 207.0, 205.0, 211.0, 212.0, 214.0, 219.0, 226.0, 221.0, 231.0, 227.0, 236.0, 239.0, 235.0, 224.0, 230.0, 220.0, 238.0, 246.0, 252.0, 247.0, 256.0, 258.0, 255.0, 259.0, 249.0, 261.0, 240.0, 229.0, 253.0, 146.0, 149.0, 171.0, 161.0, 209.0, 81.0, 87.0, 123.0, 127.0, 125.0, 112.0, 141.0, 132.0, 133.0, 154.0, 155.0, 164.0, 138.0, 144.0, 113.0, 93.0, 98.0, 115.0, 130.0, 99.0, 114.0, 120.0, 129.0, 111.0, 106.0, 116.0, 100.0, 95.0, 103.0, 89.0, 102.0, 90.0, 101.0, 92.0, 104.0, 105.0, 107.0, 110.0, 97.0, 108.0, 109.0, 126.0, 75.0, 83.0, 84.0, 151.0, 264.0, 267.0, 8.0, 9.0, 7.0, 35.0, 22.0, 11.0, 19.0, 79.0, 80.0, 76.0, 59.0, 78.0, 86.0, 82.0, 85.0, 46.0, 56.0, 44.0, 49.0, 48.0, 36.0, 40.0, 32.0, 45.0, 47.0, 31.0, 34.0, 38.0, 33.0, 43.0, 27.0, 28.0, 24.0, 69.0, 74.0, 62.0, 65.0, 54.0, 55.0, 50.0, 66.0, 64.0, 67.0, 73.0, 39.0, 41.0, 58.0, 68.0, 61.0, 60.0, 51.0, 52.0, 77.0, 14.0, 17.0, 15.0, 12.0, 13.0, 37.0, 25.0, 20.0, 23.0, 29.0, 30.0, 42.0, 71.0, 72.0, 53.0, 63.0, 57.0, 70.0, 0.0, 18.0, 4.0, 6.0, 5.0, 16.0, 21.0, 26.0, 2.0, 3.0, 1.0]}
{'rect': (0, 0, 249, 199), 'size': 46538, 'labels': [241.0, 260.0, 233.0, 265.0, 248.0, 250.0, 234.0, 245.0, 223.0, 175.0, 160.0, 237.0, 153.0, 251.0, 262.0, 242.0, 243.0, 189.0, 117.0, 121.0, 131.0, 119.0, 122.0, 118.0, 128.0, 140.0, 148.0, 244.0, 254.0, 257.0, 232.0, 263.0, 266.0, 173.0, 197.0, 174.0, 165.0, 180.0, 200.0, 204.0, 201.0, 163.0, 190.0, 222.0, 225.0, 210.0, 218.0, 198.0, 202.0, 191.0, 194.0, 147.0, 156.0, 167.0, 170.0, 135.0, 137.0, 143.0, 157.0, 134.0, 150.0, 172.0, 142.0, 152.0, 162.0, 166.0, 178.0, 183.0, 187.0, 208.0, 215.0, 206.0, 192.0, 177.0, 182.0, 195.0, 188.0, 186.0, 181.0, 169.0, 185.0, 213.0, 139.0, 207.0, 205.0, 211.0, 212.0, 214.0, 219.0, 226.0, 221.0, 231.0, 227.0, 236.0, 239.0, 235.0, 224.0, 230.0, 220.0, 238.0, 246.0, 252.0, 247.0, 256.0, 258.0, 255.0, 259.0, 249.0, 261.0, 240.0, 229.0, 253.0, 146.0, 149.0, 171.0, 161.0, 209.0, 81.0, 87.0, 123.0, 127.0, 125.0, 112.0, 141.0, 132.0, 133.0, 154.0, 155.0, 164.0, 138.0, 144.0, 113.0, 93.0, 98.0, 115.0, 130.0, 99.0, 114.0, 120.0, 129.0, 111.0, 106.0, 116.0, 100.0, 95.0, 103.0, 89.0, 102.0, 90.0, 101.0, 92.0, 104.0, 105.0, 107.0, 110.0, 97.0, 108.0, 109.0, 126.0, 75.0, 83.0, 84.0, 151.0, 264.0, 267.0, 8.0, 9.0, 7.0, 35.0, 22.0, 11.0, 19.0, 79.0, 80.0, 76.0, 59.0, 78.0, 86.0, 82.0, 85.0, 46.0, 56.0, 44.0, 49.0, 48.0, 36.0, 40.0, 32.0, 45.0, 47.0, 31.0, 34.0, 38.0, 33.0, 43.0, 27.0, 28.0, 24.0, 69.0, 74.0, 62.0, 65.0, 54.0, 55.0, 50.0, 66.0, 64.0, 67.0, 73.0, 39.0, 41.0, 58.0, 68.0, 61.0, 60.0, 51.0, 52.0, 77.0, 14.0, 17.0, 15.0, 12.0, 13.0, 37.0, 25.0, 20.0, 23.0, 29.0, 30.0, 42.0, 71.0, 72.0, 53.0, 63.0, 57.0, 70.0, 0.0, 18.0, 4.0, 6.0, 5.0, 16.0, 21.0, 26.0, 2.0, 3.0, 1.0, 145.0]}
{'rect': (0, 0, 249, 199), 'size': 46605, 'labels': [241.0, 260.0, 233.0, 265.0, 248.0, 250.0, 234.0, 245.0, 223.0, 175.0, 160.0, 237.0, 153.0, 251.0, 262.0, 242.0, 243.0, 189.0, 117.0, 121.0, 131.0, 119.0, 122.0, 118.0, 128.0, 140.0, 148.0, 244.0, 254.0, 257.0, 232.0, 263.0, 266.0, 173.0, 197.0, 174.0, 165.0, 180.0, 200.0, 204.0, 201.0, 163.0, 190.0, 222.0, 225.0, 210.0, 218.0, 198.0, 202.0, 191.0, 194.0, 147.0, 156.0, 167.0, 170.0, 135.0, 137.0, 143.0, 157.0, 134.0, 150.0, 172.0, 142.0, 152.0, 162.0, 166.0, 178.0, 183.0, 187.0, 208.0, 215.0, 206.0, 192.0, 177.0, 182.0, 195.0, 188.0, 186.0, 181.0, 169.0, 185.0, 213.0, 139.0, 207.0, 205.0, 211.0, 212.0, 214.0, 219.0, 226.0, 221.0, 231.0, 227.0, 236.0, 239.0, 235.0, 224.0, 230.0, 220.0, 238.0, 246.0, 252.0, 247.0, 256.0, 258.0, 255.0, 259.0, 249.0, 261.0, 240.0, 229.0, 253.0, 146.0, 149.0, 171.0, 161.0, 209.0, 81.0, 87.0, 123.0, 127.0, 125.0, 112.0, 141.0, 132.0, 133.0, 154.0, 155.0, 164.0, 138.0, 144.0, 113.0, 93.0, 98.0, 115.0, 130.0, 99.0, 114.0, 120.0, 129.0, 111.0, 106.0, 116.0, 100.0, 95.0, 103.0, 89.0, 102.0, 90.0, 101.0, 92.0, 104.0, 105.0, 107.0, 110.0, 97.0, 108.0, 109.0, 126.0, 75.0, 83.0, 84.0, 151.0, 264.0, 267.0, 8.0, 9.0, 7.0, 35.0, 22.0, 11.0, 19.0, 79.0, 80.0, 76.0, 59.0, 78.0, 86.0, 82.0, 85.0, 46.0, 56.0, 44.0, 49.0, 48.0, 36.0, 40.0, 32.0, 45.0, 47.0, 31.0, 34.0, 38.0, 33.0, 43.0, 27.0, 28.0, 24.0, 69.0, 74.0, 62.0, 65.0, 54.0, 55.0, 50.0, 66.0, 64.0, 67.0, 73.0, 39.0, 41.0, 58.0, 68.0, 61.0, 60.0, 51.0, 52.0, 77.0, 14.0, 17.0, 15.0, 12.0, 13.0, 37.0, 25.0, 20.0, 23.0, 29.0, 30.0, 42.0, 71.0, 72.0, 53.0, 63.0, 57.0, 70.0, 0.0, 18.0, 4.0, 6.0, 5.0, 16.0, 21.0, 26.0, 2.0, 3.0, 1.0, 145.0, 136.0]}
{'rect': (0, 0, 249, 199), 'size': 46669, 'labels': [241.0, 260.0, 233.0, 265.0, 248.0, 250.0, 234.0, 245.0, 223.0, 175.0, 160.0, 237.0, 153.0, 251.0, 262.0, 242.0, 243.0, 189.0, 117.0, 121.0, 131.0, 119.0, 122.0, 118.0, 128.0, 140.0, 148.0, 244.0, 254.0, 257.0, 232.0, 263.0, 266.0, 173.0, 197.0, 174.0, 165.0, 180.0, 200.0, 204.0, 201.0, 163.0, 190.0, 222.0, 225.0, 210.0, 218.0, 198.0, 202.0, 191.0, 194.0, 147.0, 156.0, 167.0, 170.0, 135.0, 137.0, 143.0, 157.0, 134.0, 150.0, 172.0, 142.0, 152.0, 162.0, 166.0, 178.0, 183.0, 187.0, 208.0, 215.0, 206.0, 192.0, 177.0, 182.0, 195.0, 188.0, 186.0, 181.0, 169.0, 185.0, 213.0, 139.0, 207.0, 205.0, 211.0, 212.0, 214.0, 219.0, 226.0, 221.0, 231.0, 227.0, 236.0, 239.0, 235.0, 224.0, 230.0, 220.0, 238.0, 246.0, 252.0, 247.0, 256.0, 258.0, 255.0, 259.0, 249.0, 261.0, 240.0, 229.0, 253.0, 146.0, 149.0, 171.0, 161.0, 209.0, 81.0, 87.0, 123.0, 127.0, 125.0, 112.0, 141.0, 132.0, 133.0, 154.0, 155.0, 164.0, 138.0, 144.0, 113.0, 93.0, 98.0, 115.0, 130.0, 99.0, 114.0, 120.0, 129.0, 111.0, 106.0, 116.0, 100.0, 95.0, 103.0, 89.0, 102.0, 90.0, 101.0, 92.0, 104.0, 105.0, 107.0, 110.0, 97.0, 108.0, 109.0, 126.0, 75.0, 83.0, 84.0, 151.0, 264.0, 267.0, 8.0, 9.0, 7.0, 35.0, 22.0, 11.0, 19.0, 79.0, 80.0, 76.0, 59.0, 78.0, 86.0, 82.0, 85.0, 46.0, 56.0, 44.0, 49.0, 48.0, 36.0, 40.0, 32.0, 45.0, 47.0, 31.0, 34.0, 38.0, 33.0, 43.0, 27.0, 28.0, 24.0, 69.0, 74.0, 62.0, 65.0, 54.0, 55.0, 50.0, 66.0, 64.0, 67.0, 73.0, 39.0, 41.0, 58.0, 68.0, 61.0, 60.0, 51.0, 52.0, 77.0, 14.0, 17.0, 15.0, 12.0, 13.0, 37.0, 25.0, 20.0, 23.0, 29.0, 30.0, 42.0, 71.0, 72.0, 53.0, 63.0, 57.0, 70.0, 0.0, 18.0, 4.0, 6.0, 5.0, 16.0, 21.0, 26.0, 2.0, 3.0, 1.0, 145.0, 136.0, 124.0]}
{'rect': (0, 0, 249, 199), 'size': 49493, 'labels': [241.0, 260.0, 233.0, 265.0, 248.0, 250.0, 234.0, 245.0, 223.0, 175.0, 160.0, 237.0, 153.0, 251.0, 262.0, 242.0, 243.0, 189.0, 117.0, 121.0, 131.0, 119.0, 122.0, 118.0, 128.0, 140.0, 148.0, 244.0, 254.0, 257.0, 232.0, 263.0, 266.0, 173.0, 197.0, 174.0, 165.0, 180.0, 200.0, 204.0, 201.0, 163.0, 190.0, 222.0, 225.0, 210.0, 218.0, 198.0, 202.0, 191.0, 194.0, 147.0, 156.0, 167.0, 170.0, 135.0, 137.0, 143.0, 157.0, 134.0, 150.0, 172.0, 142.0, 152.0, 162.0, 166.0, 178.0, 183.0, 187.0, 208.0, 215.0, 206.0, 192.0, 177.0, 182.0, 195.0, 188.0, 186.0, 181.0, 169.0, 185.0, 213.0, 139.0, 207.0, 205.0, 211.0, 212.0, 214.0, 219.0, 226.0, 221.0, 231.0, 227.0, 236.0, 239.0, 235.0, 224.0, 230.0, 220.0, 238.0, 246.0, 252.0, 247.0, 256.0, 258.0, 255.0, 259.0, 249.0, 261.0, 240.0, 229.0, 253.0, 146.0, 149.0, 171.0, 161.0, 209.0, 81.0, 87.0, 123.0, 127.0, 125.0, 112.0, 141.0, 132.0, 133.0, 154.0, 155.0, 164.0, 138.0, 144.0, 113.0, 93.0, 98.0, 115.0, 130.0, 99.0, 114.0, 120.0, 129.0, 111.0, 106.0, 116.0, 100.0, 95.0, 103.0, 89.0, 102.0, 90.0, 101.0, 92.0, 104.0, 105.0, 107.0, 110.0, 97.0, 108.0, 109.0, 126.0, 75.0, 83.0, 84.0, 151.0, 264.0, 267.0, 8.0, 9.0, 7.0, 35.0, 22.0, 11.0, 19.0, 79.0, 80.0, 76.0, 59.0, 78.0, 86.0, 82.0, 85.0, 46.0, 56.0, 44.0, 49.0, 48.0, 36.0, 40.0, 32.0, 45.0, 47.0, 31.0, 34.0, 38.0, 33.0, 43.0, 27.0, 28.0, 24.0, 69.0, 74.0, 62.0, 65.0, 54.0, 55.0, 50.0, 66.0, 64.0, 67.0, 73.0, 39.0, 41.0, 58.0, 68.0, 61.0, 60.0, 51.0, 52.0, 77.0, 14.0, 17.0, 15.0, 12.0, 13.0, 37.0, 25.0, 20.0, 23.0, 29.0, 30.0, 42.0, 71.0, 72.0, 53.0, 63.0, 57.0, 70.0, 0.0, 18.0, 4.0, 6.0, 5.0, 16.0, 21.0, 26.0, 2.0, 3.0, 1.0, 145.0, 136.0, 124.0, 203.0, 216.0, 184.0, 199.0, 228.0, 158.0, 176.0, 217.0, 179.0, 168.0, 159.0, 193.0, 196.0]}
{'rect': (0, 0, 249, 199), 'size': 49629, 'labels': [241.0, 260.0, 233.0, 265.0, 248.0, 250.0, 234.0, 245.0, 223.0, 175.0, 160.0, 237.0, 153.0, 251.0, 262.0, 242.0, 243.0, 189.0, 117.0, 121.0, 131.0, 119.0, 122.0, 118.0, 128.0, 140.0, 148.0, 244.0, 254.0, 257.0, 232.0, 263.0, 266.0, 173.0, 197.0, 174.0, 165.0, 180.0, 200.0, 204.0, 201.0, 163.0, 190.0, 222.0, 225.0, 210.0, 218.0, 198.0, 202.0, 191.0, 194.0, 147.0, 156.0, 167.0, 170.0, 135.0, 137.0, 143.0, 157.0, 134.0, 150.0, 172.0, 142.0, 152.0, 162.0, 166.0, 178.0, 183.0, 187.0, 208.0, 215.0, 206.0, 192.0, 177.0, 182.0, 195.0, 188.0, 186.0, 181.0, 169.0, 185.0, 213.0, 139.0, 207.0, 205.0, 211.0, 212.0, 214.0, 219.0, 226.0, 221.0, 231.0, 227.0, 236.0, 239.0, 235.0, 224.0, 230.0, 220.0, 238.0, 246.0, 252.0, 247.0, 256.0, 258.0, 255.0, 259.0, 249.0, 261.0, 240.0, 229.0, 253.0, 146.0, 149.0, 171.0, 161.0, 209.0, 81.0, 87.0, 123.0, 127.0, 125.0, 112.0, 141.0, 132.0, 133.0, 154.0, 155.0, 164.0, 138.0, 144.0, 113.0, 93.0, 98.0, 115.0, 130.0, 99.0, 114.0, 120.0, 129.0, 111.0, 106.0, 116.0, 100.0, 95.0, 103.0, 89.0, 102.0, 90.0, 101.0, 92.0, 104.0, 105.0, 107.0, 110.0, 97.0, 108.0, 109.0, 126.0, 75.0, 83.0, 84.0, 151.0, 264.0, 267.0, 8.0, 9.0, 7.0, 35.0, 22.0, 11.0, 19.0, 79.0, 80.0, 76.0, 59.0, 78.0, 86.0, 82.0, 85.0, 46.0, 56.0, 44.0, 49.0, 48.0, 36.0, 40.0, 32.0, 45.0, 47.0, 31.0, 34.0, 38.0, 33.0, 43.0, 27.0, 28.0, 24.0, 69.0, 74.0, 62.0, 65.0, 54.0, 55.0, 50.0, 66.0, 64.0, 67.0, 73.0, 39.0, 41.0, 58.0, 68.0, 61.0, 60.0, 51.0, 52.0, 77.0, 14.0, 17.0, 15.0, 12.0, 13.0, 37.0, 25.0, 20.0, 23.0, 29.0, 30.0, 42.0, 71.0, 72.0, 53.0, 63.0, 57.0, 70.0, 0.0, 18.0, 4.0, 6.0, 5.0, 16.0, 21.0, 26.0, 2.0, 3.0, 1.0, 145.0, 136.0, 124.0, 203.0, 216.0, 184.0, 199.0, 228.0, 158.0, 176.0, 217.0, 179.0, 168.0, 159.0, 193.0, 196.0, 88.0]}

Visualize all the candidate regions

The following codes simply visualize all the candidate regions extracted from the selective search

In [3]:
import seaborn as sns
def plt_rectangle(plt,label,x1,y1,x2,y2,color = "yellow", alpha=0.5):
    linewidth = 3
    if type(label) == list:
        linewidth = len(label)*3 + 2
        label = ""
        
    plt.text(x1,y1,label,fontsize=20,backgroundcolor=color,alpha=alpha)
    plt.plot([x1,x1],[y1,y2], linewidth=linewidth,color=color, alpha=alpha)
    plt.plot([x2,x2],[y1,y2], linewidth=linewidth,color=color, alpha=alpha)
    plt.plot([x1,x2],[y1,y1], linewidth=linewidth,color=color, alpha=alpha)
    plt.plot([x1,x2],[y2,y2], linewidth=linewidth,color=color, alpha=alpha)
    
    
plt.figure(figsize=(20,20))    
plt.imshow(img)
for item, color in zip(regions,sns.xkcd_rgb.values()):
    x1, y1, width, height = item["rect"]
    label = item["labels"][:5]
    plt_rectangle(plt,label,
                  x1,
                  y1,
                  x2 = x1 + width,
                  y2 = y1 + height, 
                  color= color)
plt.show()

Warp the candidate regions

In [4]:
import numpy as np 

def warp_candidate_regions(img,regions):
    ## for each candidate region, 
    ## warp the image and extract features 
    newsize_cnn = (224, 224)
    X = []
    for i, r in enumerate(regions):
        origx , origy , width, height = r["rect"]
        candidate_region = img[origy:origy + height,
                               origx:origx + width]
        img_resize = skimage.transform.resize(candidate_region,newsize_cnn)
        X.append(img_resize)

    X = np.array(X)
    print(X.shape)
    return(X)
X = warp_candidate_regions(img,regions)
/Users/yumikondo/anaconda3/lib/python3.6/site-packages/skimage/transform/_warps.py:84: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15.
  warn("The default mode, 'constant', will be changed to 'reflect' in "
(528, 224, 224, 3)

Extract CNN features

In previous blog Part 4: Object Detection with Pascal VOC2012 - CNN feature extraction, I used VGGNet to extract the pre-trained CNN features. So for inference, I will once again use the same CNN features.

Here are some explanations quoted from my previous blog post.

For each region proposal, [R-CNN](https://arxiv.org/pdf/1311.2524.pdf) proposes to extract 4096-dimensional feature vector from each region proposal from [Alex-Net](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf), the winner of the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012.

The ImageNet project is a large visual database designed for use in visual object recognition software research. The ImageNet project runs an annual software contest, the ImageNet Large Scale Visual Recognition Challenge.

Since the R-CNN paper was published in 2012, there were quite some improvement in the ILSVRC, and Alex-Net is somewhat obsolete. In this blog post, I will rather use VGGNet to extract features. VGGNet (2014) is the runner-up at the ILSVRC 2014 competition is dubbed VGGNet by the community and was developed by Simonyan and Zisserman. VGGNet consists of 16 convolutional layers and is very appealing because of its very uniform architecture.

**VGGNet**

Screen Shot 2018-11-23 at 2.43.29 PM Cited from [VGG in TensorFlow](https://www.cs.toronto.edu/~frossard/post/vgg16/). Similar to AlexNet, VGGNet uses only 3x3 convolutions. However, VGGNet has a lot more filters. See the model arthiceture above. It contains 16 layers with trainable weights. It is currently the most preferred choice in the community for extracting features from images. The weight configuration of the VGGNet is publicly available, including in [Keras](https://keras.io/applications/#vgg16), and has been used in many other applications and challenges as a baseline feature extractor. So let's get started with extracting the VGGNet.

In [5]:
from keras.applications import VGG16
modelvgg16 = VGG16(include_top=True,weights='imagenet')
modelvgg16.summary()
/Users/yumikondo/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
WARNING:tensorflow:From /Users/yumikondo/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:1264: calling reduce_prod (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
fc1 (Dense)                  (None, 4096)              102764544 
_________________________________________________________________
fc2 (Dense)                  (None, 4096)              16781312  
_________________________________________________________________
predictions (Dense)          (None, 1000)              4097000   
=================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
_________________________________________________________________

As during the training, we need to clip the last two layers of the VGGnet. Here is the discussion quoted from my previous blog Part 4: Object Detection with Pascal VOC2012 - CNN feature extraction.

VGGNet is developed for ILSVRC so its network is designed to classify images into 1,000 different classes. As I am not using VGGNet for the sake of the classification but I just need it for extracting features, I will remove the last and the second the last layer from the network.

In "Performance layer-by-layer without fine-turning" section of R-CNN paper, there is some discussion on which layer to use in Alex-Net to extract CNN features. They mentioned that removing the final two fully connected layers and use only the pool layer as CNN features for object detection can yield as good performance. They say:

Much of the CNN's representational power comes from its convolutional layers, rather than from the much larger densely connected layers.

Nevertheless, I will remove the last one fully connected layer and use the first fully connected layer output as the CNN Features. The next codes remove the last two layers.

In [6]:
from keras import models
modelvgg = models.Model(inputs  = modelvgg16.inputs, 
                        outputs = modelvgg16.layers[-3].output)
## show the deep learning model
modelvgg.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
fc1 (Dense)                  (None, 4096)              102764544 
=================================================================
Total params: 117,479,232
Trainable params: 117,479,232
Non-trainable params: 0
_________________________________________________________________

Finally Extract pretrained CNN features. This step needs some time.

In [7]:
import time
start   = time.time()
feature = modelvgg.predict(X)
end     = time.time()
print("TIME TOOK: {:5.4f}MIN".format((end-start)/60.0))
feature.shape
TIME TOOK: 7.2092MIN
Out[7]:
(528, 4096)

Prediciton with ANN classifier to find a candidate region with a person object

The classifier trained in the previous blog Part 4: Object Detection with Pascal VOC2012 - CNN feature extraction is saved at output folder under the current directory.

In [8]:
from keras.models import load_model

dir_result = "output"
classifier = load_model(os.path.join(dir_result,"classifier.h5"))
classifier.summary()
y_pred = classifier.predict(feature)
WARNING:tensorflow:From /Users/yumikondo/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:1349: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_4 (Dense)              (None, 32)                131104    
_________________________________________________________________
dense_5 (Dense)              (None, 32)                1056      
_________________________________________________________________
dense_6 (Dense)              (None, 1)                 33        
=================================================================
Total params: 132,193
Trainable params: 132,193
Non-trainable params: 0
_________________________________________________________________

Finally plot the candidate regions that have the highest/lowest likelihood of containining a person objects.

In [9]:
def plot_selected_regions_with_estimated_prob(y_pred,
                                              method="highest",
                                              upto=5):
    ## increasing order
    irows = np.argsort(y_pred[:,0])
    if method == "highest":
        irows = irows[::-1]
    count = 1
    const = 4
    fig = plt.figure(figsize=(5*const,np.ceiling(upto/5)*const))
    fig.subplots_adjust(hspace=0.13,wspace=0.0001,
                        left=0,right=1,bottom=0, top=1)
    for irow in irows:
        prob = y_pred[irow,0]
        r    = regions[irow]
        origx , origy , width, height = r["rect"]
        
        ax = fig.add_subplot(np.ceiling(upto/5),5,count)
        ax.imshow(img)
        ax.axis("off")
        plt_rectangle(ax,label="",
                      x1=origx,
                      y1=origy,
                      x2=origx + width,
                      y2=origy+height,color = "yellow", alpha=0.5)
        
        #candidate_region = img[origy:origy + height,
        #                      origx:origx + width]       
        #ax.imshow(candidate_region)
        ax.set_title("Prob={:4.3f}".format(prob))
        count += 1
        if count > upto:
            break
    plt.show()
print("The most likely candidate regions")    
plot_selected_regions_with_estimated_prob(y_pred,method="highest",upto=5)
print("The least likely candidate regions")   
plot_selected_regions_with_estimated_prob(y_pred,method="lowest",upto=5)
The most likely candidate regions
The least likely candidate regions

More examples!

Ok. The results seem very reasonable. The highest probability is assigned to the candidate region that captuers my entire facein the most compact way. On the other hand, the lowest probability is assigned to the region with the homogeneous ocean region.

But this picture may be a bit too easy for a person detection as it only contains a single person and the background color is very homogeneous. Let's try more difficult images. These pictures can be downloaded from my Github repository.

In [ ]:
dir_image = "image"
for myid in range(2,5):
    img = imageio.imread(os.path.join(dir_image,"example_id{}.JPG".format(myid)))
    img = skimage.transform.resize(img,newsize)

    regions = ss.get_region_proposal(img,min_size=50)
    X = warp_candidate_regions(img,regions)
    feature = modelvgg.predict(X)
    y_pred = classifier.predict(feature)

    plot_selected_regions_with_estimated_prob(y_pred,
                                              method="highest",
                                              upto=5)
/Users/yumikondo/anaconda3/lib/python3.6/site-packages/skimage/transform/_warps.py:84: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15.
  warn("The default mode, 'constant', will be changed to 'reflect' in "
(495, 224, 224, 3)

Comments