Yumi's Blog

Driver's facial keypoint detection

Test subject gif

What you see above is the gif of a driver subject from test data with the estimated facial keypoints.

Evaluation of driving performance is very important to reduce road accident rate. There are a lot of public data related to the driver's faces, including some from the Kaggle competition by State Farm Distracted Driver Detection. The techinical tasks associated with these driver data vary from the classification of the driver behaviors to facial keypoint detections.

In this blog post, I will use public driver data CVC11 to detect driver's facial keypoints (Right eye, left eye, nose, right mouth edge and left mouth edge).

In my previous blog post Achieving Top 23% in Kaggle's Facial Keypoints Detection with Keras + Tensorflow, I also conducted facial keypoint detection using Facial Keypoints Detection and the techinical task is more or less similar.

From this blog post, you will:

  • create a gif of the driver together with the estimated facial keypoints.
In [37]:
## Import usual libraries
import cv2
import matplotlib.pyplot as plt
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
import keras
import sys, time, os, warnings 
import numpy as np
import pandas as pd 
from collections import Counter 
warnings.filterwarnings("ignore")
print("python {}".format(sys.version))
print("keras version {}".format(keras.__version__)); del keras
print("tensorflow version {}".format(tf.__version__))
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.95
config.gpu_options.visible_device_list = "4"
set_session(tf.Session(config=config))

def set_seed(sd=123):
    from numpy.random import seed
    from tensorflow import set_random_seed
    import random as rn
    ## numpy random seed
    seed(sd)
    ## core python's random number 
    rn.seed(sd)
    ## tensor flow's random number
    set_random_seed(sd)
    
python 2.7.13 |Anaconda 4.3.1 (64-bit)| (default, Dec 20 2016, 23:09:15) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
keras version 2.1.3
tensorflow version 1.5.0

I downloaded the DrivFace zip file from CVC11and saved it at:

In [38]:
dir_data = "DrivFace/"

Let's take a look at the read me file. The images are obtained from 4 drivers. cat DriveFace/readme.txt returns the following info:

The DrivFace database contains images sequences of subjects while driving in real scenarios. It is composed of 606 samples of 640�480 pixels each, acquired over different days from 4 drivers (2 women and 2 men) with several facial features like glasses and beard.
The ground truth contains the annotation of the face bounding box and the facial key points (eyes, nose and mouth). A set of labels assigning each image into 3 possible gaze direction classes are given. The first class is the �looking-right� class and contains the head angles between -45� and -30�. The second one is the �frontal� class and contains the head angles between -15� and 15�. The last one is the �looking-left� class and contains the head angles between 30� and 45�.
ATTN: This database is free for academic usage. For other purposes, please contact PhD. Katerine Diaz (kdiaz@cvc.uab.es).
Files and scripts � DrivImages.zip has the driver images. The imag's name has the format: * YearMonthDay_subject_Driv_imNum_HeadPose.jpg
i.e. 20130529_01_Driv_011_f .jpg is a frame of the fisrts driver corresponding to the 11 sequence's image and the head pose is frontal. subject = [1:4], imNum = [001:...], HeadPose = lr (looking-right), f (frontal) and lf (looking-left).
� drivPoints.txt contains the ground truth in table's format, where the columns have the follow information: * fileName is the imagen's name into DrivImages.zip * subject = [1:4] * imgNum = int * label = [1/2/3] (head pose class that corresponding to [lr/f/lf], respectively) * ang = [-45, -30/ -15 0 15/ 30 15] (head pose angle) * [xF yF wF hF] = face position * [xRE yRE] = rigth eye position * [xLE yL] = left eye position * [xN yN] = Nose position * [xRM yRM] = rigth corner of mouth * [xLM yLM] = left corner of mouth
� read_drivPoints.m is a Matlab function to read the drivPoints file. You can also use: * Table = readtable('drivPoints.txt');
� DrivFace.zip has all the previous files

Citations

Katerine Diaz-Chito, Aura Hern�ndez-Sabat�, Antonio M. L�pez, A reduced feature set for driver head pose estimation, Applied Soft Computing, Volume 45, August 2016, Pages 98-107, ISSN 1568-4946, http://dx.doi.org/10.1016/j.asoc.2016.04.027.
Contact information Questions? Email Katerine Diaz (kdiaz@cvc.uab.es)

Read in the annocation data

As the first step, I will read in the drivePoints.txt into a panda dataframe that contains the facial keypoints and driver's infomation for each image.

The data contain not only the facial keypoints but also the bounding box.

In [39]:
labels = open(dir_data + "drivPoints.txt").read()
labels = labels.split("\r\n")

lines = [line.split(",") for line in labels]
df_label = pd.DataFrame(lines[1:],columns=lines[0])
cols = list(set(df_label.columns) - set(["fileName"]))
df_label[cols] = df_label[cols].apply(pd.to_numeric, errors='coerce')
## remove the rows with NA
print(df_label.shape)
df_label = df_label.dropna()
print(df_label.shape)
df_label.head(3)
(607, 19)
(606, 19)
Out[39]:
fileName subject imgNum label ang xF yF wF hF xRE yRE xLE yLE xN yN xRM yRM xLM yLM
0 20130529_01_Driv_001_f 1.0 1.0 2.0 0.0 292.0 209.0 100.0 112.0 323.0 232.0 367.0 231.0 353.0 254.0 332.0 278.0 361.0 278.0
1 20130529_01_Driv_002_f 1.0 2.0 2.0 0.0 286.0 200.0 109.0 128.0 324.0 235.0 366.0 235.0 353.0 258.0 333.0 281.0 361.0 281.0
2 20130529_01_Driv_003_f 1.0 3.0 2.0 0.0 290.0 204.0 105.0 121.0 325.0 240.0 367.0 239.0 351.0 260.0 334.0 282.0 362.0 282.0

Create a list object imgs. imgs[i] contains a numpy array of image corresponding to the df_label.iloc[i,:].

In [40]:
from keras.preprocessing.image import img_to_array, load_img

imgs = []
count = 0
for jpg in df_label["fileName"]:
    if count % 100 == 0:
        print(count)
    try:
        img = img_to_array(load_img(dir_data + "/DrivImages/" + jpg +".jpg"))
    except:
        img = []
        pass
    imgs.append(img)
    count += 1

assert len(imgs) == df_label.shape[0]
0
100
200
300
400
500
600

look at some example images

For each driver, we plot the first 4 images, together with the bounding box and the keypoints. I will use the first 3 drivers to create a model and use the last driver to test the model performance.

In [41]:
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import numpy as np 
uni_subject = np.unique(df_label["subject"])

Nimage_subj = 4
count = 1 
fig = plt.figure(figsize=(25,18))
for subj in uni_subject:
    pick = df_label["subject"] == subj
    df_label_subj = df_label.loc[pick,:]
    start = np.min(np.where(pick)[0])
    imgs_subj = imgs[start:]
    for wh, jpg in enumerate(df_label_subj["fileName"].values[:Nimage_subj]):
        img = imgs_subj[wh]
        row = df_label_subj.loc[df_label_subj["fileName"] == jpg,:]

        
        ax = fig.add_subplot(len(uni_subject),Nimage_subj,count)
        
        ax.imshow(img/255)
        ## the boundign box
        ax.add_patch(
            patches.Rectangle(
                (row["xF"].values,row["yF"].values),   # (x,y)
                row["wF"].values,          # width
                row["hF"].values,          # height
                fill=False 
            )
        )
        ## facial key points
        ax.scatter(row["xRE"].values,row["yRE"].values,c="red")
        ax.scatter(row["xLE"].values,row["yLE"].values,c="red")
        ax.scatter(row["xN"].values,row["yN"].values,c="yellow")
        ax.scatter(row["xRM"].values,row["yRM"].values,c="blue")
        ax.scatter(row["xLM"].values,row["yLM"].values,c="blue")
        
        count +=1
plt.show()

Assume that we have a bounding box

As this data contains the bounding box, we could use this bounding box to trim out the face of the driver.

However, relying on the bounding box may be problematic in the testing scenario where there is no available bounding box. In such cases, we might need to use some machine learning methods to find bounding box. Such methods are discussed in various blogs e.g.:

Nevertheless, for this blog post, I will assume that the bounding box is given.

In [42]:
def adjust_loc(xys,x_topleft,y_topleft):
    out = []
    for xy in xys:
        out.append((xy[0] - x_topleft,
                    xy[1] - y_topleft))
    return(out)

def get_bounding_box(img,row=None,printing=False):
    '''
    if row is provided, then use the bounding box defined in dataframe
    '''

        
    faces=[(int(row["xF"]),
                int(row["yF"]),
                int(row["wF"]),
                int(row["hF"]))]

    if printing:
        print('Faces found: ', len(faces))
    
    return(faces)

def adjust_xy(x,y,shape_orig,shape_new):
    xnew = x*shape_new[1]/float(shape_orig[1])
    ynew = y*shape_new[0]/float(shape_orig[0])
    return xnew,ynew


def adjust_xys(xys,img_small,img_small_resize):
    '''
    a vector containing:
    ['xRE', 'yRE', 'xLE', 'yLE', 'xN', 'yN', 'xRM', 'yRM', 'xLM', 'yLM']
    '''
    xys_resize = []
    for xy in xys:
        x, y = adjust_xy(xy[0],xy[1],
                         img_small.shape,
                         img_small_resize.shape)
       
        xys_resize.extend([x,y])
    return(xys_resize)

Record the names of columns containing the facial landmarks

In [43]:
labels_to_keep = ["RE","LE","N","RM","LM"]
Ycolumns = []
for lab in labels_to_keep:
    Ycolumns.extend(["x" + lab, "y" + lab])
    
Ycolumns 
Out[43]:
['xRE', 'yRE', 'xLE', 'yLE', 'xN', 'yN', 'xRM', 'yRM', 'xLM', 'yLM']

Modeling strategy

The entire original image is pretty large: 640 x 480. We will trim out the images in bounding box and use them as training images.

Here, I have a problem. The bounding box sizes differ across images. My CNN model has fully connected layers, meaning that the model assumes that the image sizes are the same across all the training data. Therefore, I will reshape the image in bounded box into the pre-specified target shape, and change the facial key point coordinates accordingly.

Set the target size:

In [44]:
target_shape = (90,90)

Here is the example plots to show how original image is converted to target shape using bounding box.

In [45]:
from copy import copy


X, Y = [], []
for i in [1,200,400,600]:
    row = df_label.iloc[i,:]
    img = imgs[i]
    

    xys = []
    for label in labels_to_keep:
        xys.append((row["x"+label], 
                    row["y"+label]))


    #go over list of faces and draw them as rectangles on original colored 
    img_copy = copy(img)
    faces = get_bounding_box(img,row)
    (x, y, w, h) = faces[0]     
    cv2.rectangle(img_copy, (x, y), (x+w, y+h), (0, 255, 0), 2)
    xys = adjust_loc(xys,x,y)
    
    ## 1. original image
    ## convert image to RGB and show image 
    fig = plt.figure(figsize=(10,10))
    ax = fig.add_subplot(1,3,1)
    ax.imshow(img_copy/255)
    ax.set_title("original\nshape={}".format(img_copy.shape))
    ax.grid(False)
    
    ## 2. cascade image
    ax = fig.add_subplot(1,3,2)
    img_small = img[y:(y+h),x:(x+w)]
    ax.imshow(img_small/255)
    for xy in xys:
        ax.scatter(xy[0],xy[1])
    ax.set_title("bounding box\nshape={}".format(img_small.shape))
    ax.grid(False)
    
    ##3. reshaped cascade image
    ax = fig.add_subplot(1,3,3)
    img_small_resize = cv2.resize(img_small,target_shape)
    ax.imshow(img_small_resize/255)
    ax.set_title("resized image\nshape={}".format(img_small_resize.shape))
    ax.grid(False)
    
    xys_resize = adjust_xys(xys,img_small,img_small_resize)
    for i in range(0,len(xys_resize),2):
        ax.scatter(xys_resize[i],xys_resize[i+1])
    
    plt.show()
    

Now, I will create training and testing images. Each image size is fixed to the target_shape. In df_label, I will record which image is used as a training image and which one is used as testing image.

In [46]:
df_label["testTF"] = np.NaN
df_label["testID"] = np.NaN
In [47]:
X, Y, testTF = [], [], []

count = 0
## I will use the last subject as the testing data
test_subjectID = [4]
for i in range(len(imgs)):
    row = df_label.iloc[i,:]
    img = imgs[i]
    
    if len(img) < 1:
        print("no image! {}".format(i))
        continue
    
    faces = get_bounding_box(img,row)
    
    if len(faces) < 1:
        ## no face is discovered
        print("no face discovered {}".format(i))
        continue
    
    ## shifting and rescaling anotation
    xys = []
    for label in labels_to_keep:
        xys.append((row["x"+label], 
                    row["y"+label]))

    (x, y, w, h) = faces[0]     
    xys = adjust_loc(xys,x,y)

    img_small = img[y:(y+h),x:(x+w)]    
    img_small_resize = cv2.resize(img_small,target_shape)

    xys_resize = adjust_xys(xys,img_small,img_small_resize)
    X.append(img_small_resize)
    Y.append(xys_resize)
    
    inTest = row["subject"] in test_subjectID
    
    testTF.extend([ inTest ])
    
    df_label["testTF"].iloc[i] = inTest
    if inTest:
        df_label["testID"].iloc[i] = count
        count += 1
X, Y, testTF = np.array(X), np.array(Y), np.array(testTF)


X_train0, y_train0 = X[~testTF],Y[~testTF]
X_test0,  y_test0  = X[ testTF],Y[ testTF]

Standardization

Standardize the data according to the mean and standard deviations of the facial keypoints in training set.

In [48]:
mY  = np.mean(y_train0,axis=0)
sdY =  np.std(y_train0,axis=0)
def standy(y_train0,printTF=False):
    if printTF:
        print("Range in original scale: [{:5.3f},{:5.3f})".format(
            np.min(y_train0),np.max(y_train0)))
    y_train0 = (y_train0 - mY)/sdY
    if printTF:
        print("Range in standardized scale: [{:5.3f},{:5.3f})".format(
            np.min(y_train0),np.max(y_train0)))    
    return( y_train0)


y_train, y_test = standy(y_train0), standy(y_test0)
X_train, X_test = X_train0/255.0, X_test0/255.0

print(X_train.shape,y_train.shape)
print(X_test.shape,y_test.shape)
((516, 90, 90, 3), (516, 10))
((90, 90, 90, 3), (90, 10))

Baseline

In this analysis, we assume that the bounding box is provided. Given bounding box and without model, the natural way to estimate the facial keypoints is to use the mean facial keypoints. i.e., no matter what image is, baseline guesses the (x,y) coordinate of right eye at (34.5,20.85). I will compare the model performance with this baseline method.

In [49]:
y_pred_baseline = y_train0.mean(axis=0)
print("The baseline estimate for the x,y coordinate of right eye is at ({:4.3f},{:4.3f}) within the bounding box".format(y_pred_baseline[0],y_pred_baseline[1]))
The baseline estimate for the x,y coordinate of right eye is at (31.029,18.765) within the bounding box

Define CNN model

I use standard CNN models. Notice that I am using Dense layer at the end. This is the reason why I need to ensure that the training image sizes are the same.

In [50]:
import numpy as np
from keras.models import Sequential
from keras.layers import Conv2D, Activation, MaxPooling2D, Dense, Flatten

batch_size = 64
num_channels = 3


model = Sequential()

# uses theano ordering. Note that we leave the image size as None to allow multiple image sizes
model.add(Conv2D(32, 3, 3, 
                 border_mode='same', 
                 name="conv2d",
                 input_shape=target_shape + (num_channels,)))
model.add(Activation('relu'))
model.add(Conv2D(32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(32, 3, 3, border_mode='same'))
model.add(Activation('relu'))
model.add(Conv2D(32, 3, 3))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(Y.shape[1],name="Dense_layer"))


model.compile(loss='mse', optimizer='sgd')
model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 90, 90, 32)        896       
_________________________________________________________________
activation_5 (Activation)    (None, 90, 90, 32)        0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 88, 88, 32)        9248      
_________________________________________________________________
activation_6 (Activation)    (None, 88, 88, 32)        0         
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 44, 44, 32)        0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 44, 44, 32)        9248      
_________________________________________________________________
activation_7 (Activation)    (None, 44, 44, 32)        0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 42, 42, 32)        9248      
_________________________________________________________________
activation_8 (Activation)    (None, 42, 42, 32)        0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 56448)             0         
_________________________________________________________________
Dense_layer (Dense)          (None, 10)                564490    
=================================================================
Total params: 593,130
Trainable params: 593,130
Non-trainable params: 0
_________________________________________________________________

Training starts here

Here, I use 20% of randomly selected the training data as the validation data. This is probablly not a very good idea considering the high correlation across adjacent images within subject. Ideally, training and validation data should not share the same subjects so that validation data can assess the model performance for the future subjects. However, since we only have 4 subjects, we will do random split.

In [51]:
hist = model.fit(X_train,y_train,
                 batch_size=256,
                 epochs=800,
                 verbose=False,
                 validation_split=0.2)

Save the model arthitecture and the weights

In [52]:
modelname = "model_facialkeypoints.h5"
model.save(modelname)
statinfo = os.path.getsize(modelname)
print("The model size: {}MB".format(statinfo/1000.0/1000))
The model size: 4.781528MB

plot training and validation loss

The backpropagation algorithm seems to converge.

In [67]:
for label in hist.history.keys():
    plt.plot(hist.history[label],label=label)
plt.xlabel("epochs")
plt.legend()
plt.title("The final val_loss={:4.3f}".format(hist.history["val_loss"][-1]))
plt.show()

Model performance on testing data

Un-standardize the estimated facial keypoints. The estimated x,y coordinates of y_pred corresponds to the target_shape.

In [54]:
y_pred = model.predict(X_test)
y_pred = y_pred * sdY + mY
assert np.all(y_pred <= np.max(target_shape))
assert np.all(y_pred >=0)

Convert y_pred into the original scale

To visualize the model performance on the original image, we need to convert the y_pred in original scale.

Currently, y_pred contains the (x,y) coordinates of the target_shape. Remind you how the original image is transformed into a training image:

  • original image (640, 480) ⇨
  • image in bounding box (xF, yF, wF,hF) ⇨
    • the value of bx and by depends on the image
  • resize the image to target_shape

Therefore, in order to convert the (x',y') coordinates of y_pred into the original image scale (x_orig,y_orig), we need to:

  • x_orig = x' * wF/target_shape[0] + xF
  • y_orig = y' * hF/target_shape[1] + yF

In the following code, y_pred_orig contains the (x,y) coordinates of the estimated facial keypoints in the original image scale.

In [55]:
df_label_test = df_label.loc[df_label["testTF"]]

y_pred_orig = []
for irow in range(y_pred.shape[0]):
    yp = copy(y_pred[irow]) ## y_pred contains (x-RE, y-RE, x-LE, y-LE,....)
    row = df_label_test.iloc[irow,:]

    yp[0::2] = yp[0::2]*row["wF"]/float(target_shape[0]) + row["xF"]
    yp[1::2] = yp[1::2]*row["hF"]/float(target_shape[1]) + row["yF"]

    y_pred_orig.append(yp)
y_pred_orig = np.array(y_pred_orig)

assert y_pred_orig.shape[0] == np.sum(df_label["testTF"])
assert len(imgs) == df_label.shape[0]

Evaluate the model performance in terms of the normalized mean Euclidean distances between the true facial keypoint and the estimated one.

Let ($x_i,y_i$) be the facial keypoint of interest e.g., nose coordinate or left eye coordiante. Let also ($\widehat{x}_i,\widehat{y}_i$) be their predicted values. Then for each landmark, I report the mean normalized euclidean distance between the predicted and the true landmark. The normalization is done by the inter-pupil distance.

$$ \frac{1}{N}\sum_{i=1}^N \frac{ \sqrt{(x_i - \widehat{x}_i)^2 + (y_i - \widehat{y}_i)^2} }{ \textrm{inter pupil distance}_i } $$ where $$ \textrm{inter pupil distance}_i = \sqrt{ (x_i^{\textrm{Right}} - x_i^{\textrm{Left}} )^2 + (y_i^{\textrm{Right}} - y_i^{\textrm{Left}} )^2 } $$ You can think of the normalized ED as the average relative error and 100% means that the average error is as large as the distance between the eyes.

Calculate inter pupil distance for each image:

In [56]:
df_label["IPD"] = np.sqrt((df_label["xRE"] - df_label["xLE"])**2 + (df_label["yRE"] - df_label["yLE"])**2)

Calculate the euclidean distance (ED) between the predicted facial keypoint and the true facial keypoint for RE, LE, N, RM and LM.

As the predicted facial keypoints, I consider both the model and the baseline.

In [57]:
for ii, facialkp in enumerate(labels_to_keep):
    i = ii*2
    ## Model prediction
    df_label["Model_" + facialkp] = np.NaN
    xterm = (y_pred[:,i] - y_test0[:,i])**2
    yterm = (y_pred[:,i+1] - y_test0[:,i+1])**2
    df_label["Model_" + facialkp].loc[df_label["testTF"].values] = np.sqrt(xterm + yterm)
    ## baseline 
    df_label["Baseline_" + facialkp] = np.NaN
    xterm = (y_pred_baseline[i] - y_test0[:,i])**2
    yterm = (y_pred_baseline[i+1] - y_test0[:,i+1])**2
    df_label["Baseline_" + facialkp].loc[df_label["testTF"].values] = np.sqrt(xterm + yterm)

Finally, I am ready to plot some results on testing data. I will plot the predicted landmarks on the original image (640 x 480) and also on the resized bounded box (target_shape). I will also report the normalized euclidean distance between the true and predicted landmarks for all 5 facial keypoints.

In [58]:
def plot_test_subject(X_test_row,y_pred_row,row,img_row,y_pred_orig_row,figureimg=None):
    s = 100
    marker="X"
    fig = plt.figure(figsize=(20,10))
    fig.subplots_adjust ( hspace = 0, wspace = 0 )
    ax = plt.subplot2grid((2, 3), ## (Nrow, Ncol)
                          (0, 0), ## ( row, col)
                          colspan=2,rowspan=2)
    ax.grid(False)
    ax.imshow(img_row/255)
    for p in range(0,y_pred_orig.shape[1],2):
        ax.scatter(y_pred_orig_row[p],
                   y_pred_orig_row[p+1],c="green",s=s,marker=marker)
    
    
    
    ax = plt.subplot2grid((2, 3), (0,2), colspan=1,rowspan=1)
    ax.imshow(X_test_row)
    ax.grid(False)
    for p in range(0,len(y_pred_row),2):
        ax.scatter(y_pred_row[p],
                   y_pred_row[p+1],c="green",s=s,marker=marker)
    

    
    ax = plt.subplot2grid((2, 3), (1,2), colspan=1,rowspan=1)
    ax.axis('off')
    ax.plot()
    ax.set_xlim(0,1)
    ax.set_ylim(0,len(labels_to_keep))
    for i, lab in enumerate(labels_to_keep):
        ax.text(0,i,"normalized ED {:3.2} {:4.2f}%".format(
            lab,row["Model_"+lab]/row["IPD"]*100),fontsize=30)
    
    if figureimg is not None:
        plt.savefig( figureimg + '.png',bbox_inches='tight',pad_inches=0)
    else:
        plt.show()   
            
    

    
    
for irow in [1,80]:
    irow_orig = np.where(df_label["testID"] == irow)[0][0]
    row = df_label.iloc[irow_orig,:]
    img_row = imgs[irow_orig]
    plot_test_subject(X_test[irow],y_pred[irow],row, 
                      img_row, y_pred_orig[irow],
                      figureimg=None)
 

Create gif for the test subject driver

Rather than plotting all the images in ipython notebook, let's create gif out of them! To do this, we first need to save all the frames as .png.

In [59]:
   
Nimage = y_pred.shape[0]


dir_test_subject = "test_subject/"
try:
    os.mkdir(dir_test_subject)
except:
    pass

for irow in range(Nimage):
    irow_orig = np.where(df_label["testID"] == irow)[0][0]
    row = df_label.iloc[irow_orig,:]
    img_row = imgs[irow_orig]
    plot_test_subject(X_test[irow],y_pred[irow],row, 
                      img_row, y_pred_orig[irow],
                      figureimg=dir_test_subject + "/fig{:05.0f}".format(irow))
print(Nimage)   
90

Check the number of figures that are saved in the test_subject folder. This has to be the same as the number of images.

In [60]:
ls -lrth $dir_test_subject | grep ".png" | wc -l
90

Combine all the png images and create a single gif. This is the gif at the very beginning of this blog post.

In [61]:
import imageio
filenames = np.sort(os.listdir(dir_test_subject))
filenames = [ fnm for fnm in filenames if ".png" in fnm]

with imageio.get_writer(dir_test_subject + '/test_subject.gif', mode='I') as writer:
    for filename in filenames:
        image = imageio.imread(dir_test_subject + filename)
        writer.append_data(image)
        

Test subject gif

Box plot to compare the model performance with the baseline method

OK. THe model performance looks reasonable for this test subject. But how much of the model performance is attributable to the model and how much is from having bounding box?

In [62]:
collabels = []
for nm in labels_to_keep:
    collabels.extend(["Model_"+ nm,"Baseline_"+ nm])
    
df_eval = df_label[collabels] 
In [63]:
df_eval = df_eval.dropna() ## NA is recorded for the training image 
## un-pivot so that there is a a single column containing box plot values
## this un-pivot is necessary for seaborn boxplot
df_boxplot = pd.melt(df_eval,  value_vars=collabels)
v = np.array([ term.split("_") for term in df_boxplot["variable"]])
df_boxplot["procedure"] = v[:,0]
df_boxplot["keypoints"] = v[:,1]

The table below shows the median normalized euclidean distances and the proportion of the normalized euclidean distances less than 10% for each landmark and for each procedure. Clearly, the model performance is better by using CNN for all the landmark except the left mouth edge.

Sadly, the left mouth edge is predicted better by the baseline! We need to do more data augmentation to improve the model performance for this landmark!

In [64]:
def proplessthan(vec):
    values = [np.median(vec["value"]), 
              np.mean(vec["value"] < 10)*100]
    return(pd.Series(values,index=["Median normalized ED","% (normalized ED < 10%)"] ))
df_boxplot_summary  = df_boxplot.groupby(["keypoints","procedure"]).apply(proplessthan).reset_index()
df_boxplot_summary
Out[64]:
keypoints procedure Median normalized ED % (normalized ED < 10%)
0 LE Baseline 5.216987 80.000000
1 LE Model 4.184403 95.555556
2 LM Baseline 3.862965 97.777778
3 LM Model 4.128279 97.777778
4 N Baseline 7.346033 65.555556
5 N Model 5.725722 87.777778
6 RE Baseline 6.890649 78.888889
7 RE Model 3.794306 95.555556
8 RM Baseline 5.090090 97.777778
9 RM Model 3.349447 100.000000

Boxplot for assessing the model performance on test subject

In [66]:
import seaborn as sns
fig = plt.figure(figsize=(20,10))
ax = fig.add_subplot(1,2,1)

bp = sns.boxplot(x="keypoints", 
                 y="value", 
                 hue="procedure",
                 data=df_boxplot, palette="Set3",
                 ax=ax)

ax.set_title("Mean Square Error/ inter pupil range")

ax = fig.add_subplot(1,2,2)
ax.plot()
ax.set_xlim(0,1)
ax.axis("off")
ax.set_ylim(0,df_boxplot_summary.shape[0])
for i in range(df_boxplot_summary.shape[0]):
    ax.text(0,i,"{:8} {:10} Median normalized ED {:4.2f}%, %(normalized ED < 10%)={:4.2f}% ".format(
            *df_boxplot_summary.iloc[i,:].values)
           ,fontsize=20)
 

plt.show()

Comments