Yumi's Blog

Saliency Map with keras-vis

saliency_map_golden_retriver Image Specific Class Saliency Visualization allows better understanding of why a model makes a classification decision. The goal of this blog is to understand its concept and how to interpret the Saliency Map.

Reference

Reference in this blog

To set up the same conda environment as mine, follow:

Visualization of deep learning classification model using keras-vis

Setup

In [1]:
import keras
import tensorflow as tf
import vis ## keras-vis
import matplotlib.pyplot as plt
import numpy as np
print("keras      {}".format(keras.__version__))
print("tensorflow {}".format(tf.__version__))
Using TensorFlow backend.
keras      2.2.2
tensorflow 1.10.0

Read in pre-trained model

For this exersize, I will use VGG16.

In [2]:
from keras.applications.vgg16 import VGG16, preprocess_input
model = VGG16(weights='imagenet')
model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
fc1 (Dense)                  (None, 4096)              102764544 
_________________________________________________________________
fc2 (Dense)                  (None, 4096)              16781312  
_________________________________________________________________
predictions (Dense)          (None, 1000)              4097000   
=================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
_________________________________________________________________

Download a json file containing ImageNet class names.

wget "https://raw.githubusercontent.com/raghakot/keras-vis/master/resources/imagenet_class_index.json"

Read in the class index json file

In [3]:
import json
CLASS_INDEX = json.load(open("imagenet_class_index.json"))
classlabel = []
for i_dict in range(len(CLASS_INDEX)):
    classlabel.append(CLASS_INDEX[str(i_dict)][1])
print("N of class={}".format(len(classlabel)))
N of class=1000

Let's read in an image that contains both dog and cat. Clearly this image would be very confusing for a model trained with ImageNet, which often has a single object per image.

The goal of this exersise is to understand why VGG16 model makes classification decision.

In [4]:
from keras.preprocessing.image import load_img, img_to_array
_img = load_img("dog_and_cat.jpg",target_size=(224,224))
plt.imshow(_img)
plt.show()

Let's predict the object class of this image, and show the top 5 predicted classes.

Unfortunately, the 2nd top predicted class does not make sense. Why bath_towel?? However, you see that some of the predicted classes are dogs and kind of makes sense:

Top 1 predicted class:     Pr(Class=redbone            [index=168])=0.360
Top 3 predicted class:     Pr(Class=bloodhound         [index=163])=0.076
Top 4 predicted class:     Pr(Class=basenji            [index=253])=0.042
Top 5 predicted class:     Pr(Class=golden_retriever   [index=207])=0.041
In [5]:
img               = img_to_array(_img)
img               = preprocess_input(img)
y_pred            = model.predict(img[np.newaxis,...])
class_idxs_sorted = np.argsort(y_pred.flatten())[::-1]
topNclass         = 5
for i, idx in enumerate(class_idxs_sorted[:topNclass]):
    print("Top {} predicted class:     Pr(Class={:18} [index={}])={:5.3f}".format(
          i + 1,classlabel[idx],idx,y_pred[0,idx]))
Top 1 predicted class:     Pr(Class=redbone            [index=168])=0.360
Top 2 predicted class:     Pr(Class=bath_towel         [index=434])=0.076
Top 3 predicted class:     Pr(Class=bloodhound         [index=163])=0.076
Top 4 predicted class:     Pr(Class=basenji            [index=253])=0.042
Top 5 predicted class:     Pr(Class=golden_retriever   [index=207])=0.041

Image Specific Class Saliency

Saliency map for a given image array $\texttt{image}_0 \in R^{\texttt{height} \textrm{ x } \texttt{width} \textrm{ x } 3}$ is defined as the pixel-wise maximum of:

$$ \bigg| \frac{ d \texttt{loss}_c }{ d \textrm{image} } \big\vert_{\textrm{image} = \texttt{image}_0} \bigg| $$

across channels.

Interpretation

  • If we can simplfy the CNN as a linear model of the form: $$ \texttt{loss}_c = \boldsymbol{w}_c * \textrm{image} + \boldsymbol{b}_c $$ In this case, it is easy to see that the magnitude of elements of $\boldsymbol{w}_c$ defines the importance of the corresponding pixels of $\textrm{image}$ for class $c$. Indeed, the derivative that composes saliency map is $d \texttt{loss}_c/ d \textrm{image} \big\vert_{\textrm{image} = \texttt{image}_0} = \boldsymbol{w}_c$ where $\boldsymbol{w}_c,\textrm{image}, \boldsymbol{b}_c \in R^{\texttt{height} \textrm{ x } \texttt{width} \textrm{ x } 3}$
  • class score derivative shows the magnitude of the derivative indicating which pixels need to be changed the least to affect the class score the most. One can expect that such pixels correspond to the object location in the image.
In [6]:
from vis.utils import utils
# Utility to search for layer index by name. 
# Alternatively we can specify this as -1 since it corresponds to the last layer.
layer_idx = utils.find_layer_idx(model, 'predictions')

# The code above is essentially doing:
#for i, layer in enumerate(model.layers):
#    print("{:2.0f} {:10}".format(i, layer.name))
#    if "predictions" in layer.name:
#        layer_idx = i

Here, we need to do one trick: modify the final prediction class score to be unnormalized. i.e., do not use soft-max layer.

The reason is that maximizing an output node can be done by minimizing other outputs. Softmax is weird that way. It is the only activation that depends on other node output(s) in the layer.

In [7]:
# Swap softmax with linear
model.layers[layer_idx].activation = keras.activations.linear
model = utils.apply_modifications(model)
/Users/yumikondo/anaconda3/envs/explainableAI/lib/python3.5/site-packages/keras/engine/saving.py:269: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
  warnings.warn('No training configuration found in save file: '
In [8]:
help(utils.apply_modifications)
Help on function apply_modifications in module vis.utils.utils:

apply_modifications(model, custom_objects=None)
    Applies modifications to the model layers to create a new Graph. For example, simply changing
    `model.layers[idx].activation = new activation` does not change the graph. The entire graph needs to be updated
    with modified inbound and outbound tensors because of change in layer building function.
    
    Args:
        model: The `keras.models.Model` instance.
    
    Returns:
        The modified model with changes applied. Does not mutate the original `model`.

Let's visualize the saliency map using keras-vis.

In [9]:
from vis.visualization import visualize_saliency
class_idx = class_idxs_sorted[0]
grad_top1 = visualize_saliency(model,
                               layer_idx,
                               filter_indices = class_idx,
                               seed_input     = img[np.newaxis,...])

Visualization

In [10]:
def plot_map(grads):
    fig, axes = plt.subplots(1,2,figsize=(14,5))
    axes[0].imshow(_img)
    axes[1].imshow(_img)
    i = axes[1].imshow(grads,cmap="jet",alpha=0.8)
    fig.colorbar(i)
    plt.suptitle("Pr(class={}) = {:5.2f}".format(
                      classlabel[class_idx],
                      y_pred[0,class_idx]))
plot_map(grad_top1)

Notice that for bath_towel prediction, the intensity of saliency is everywhere. However, the saliency intensity concentrates around the dog, especially for golden_retriever and bloodhound.

In [11]:
for class_idx in class_idxs_sorted[:topNclass]:
    grads  = visualize_saliency(model,
                               layer_idx,
                               filter_indices = class_idx,
                               seed_input     = img[np.newaxis,...])
    plot_map(grads)

Saliency Map by hand

To understand the Saliency map better, I am going to implement them by hand. This is actually pretty easy.

You can follow the source codes (keras-vis/vis/visualization/activation_maximization.py) of keras-vis to understand Saliency map. The code looks very complex at the first glance, because keras-vis is very generalized API. However, for this simple example, you can generate the same Saliency map in a few lines:

In [12]:
import keras.backend as K
## select class of interest
class_idx         = class_idxs_sorted[0]
## define derivative d loss / d layer_input
layer_input       = model.input
## This model must already use linear activation for the final layer
loss              = model.layers[layer_idx].output[...,class_idx]
grad_tensor       = K.gradients(loss,layer_input)[0]

## create function that evaluate the gradient for a given input
# This function accept numpy array
derivative_fn     = K.function([layer_input],[grad_tensor])

## evaluate the derivative_fn
grad_eval_by_hand = derivative_fn([img[np.newaxis,...]])[0]
print(grad_eval_by_hand.shape)


grad_eval_by_hand = np.abs(grad_eval_by_hand).max(axis=(0,3))

## normalize to range between 0 and 1
arr_min, arr_max  = np.min(grad_eval_by_hand), np.max(grad_eval_by_hand)
grad_eval_by_hand = (grad_eval_by_hand - arr_min) / (arr_max - arr_min + K.epsilon())
(1, 224, 224, 3)

Check if the numerical calculation by hand is the same as Saliency Map from keras-vis

In [13]:
assert np.all(np.abs(grad_eval_by_hand - grad_top1)<0.00001)

Visualization

In [14]:
plot_map(grad_eval_by_hand)

Comments