# Saliency Map with keras-vis

Image Specific Class Saliency Visualization allows better understanding of why a model makes a classification decision. The goal of this blog is to understand its concept and how to interpret the Saliency Map.

# Reference¶

## Reference in this blog¶

To set up the same conda environment as mine, follow:

Visualization of deep learning classification model using keras-vis

# Setup¶

In [1]:
import keras
import tensorflow as tf
import vis ## keras-vis
import matplotlib.pyplot as plt
import numpy as np
print("keras      {}".format(keras.__version__))
print("tensorflow {}".format(tf.__version__))

Using TensorFlow backend.

keras      2.2.2
tensorflow 1.10.0


For this exersize, I will use VGG16.

In [2]:
from keras.applications.vgg16 import VGG16, preprocess_input
model = VGG16(weights='imagenet')
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         (None, 224, 224, 3)       0
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0
_________________________________________________________________
fc1 (Dense)                  (None, 4096)              102764544
_________________________________________________________________
fc2 (Dense)                  (None, 4096)              16781312
_________________________________________________________________
predictions (Dense)          (None, 1000)              4097000
=================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
_________________________________________________________________


wget "https://raw.githubusercontent.com/raghakot/keras-vis/master/resources/imagenet_class_index.json"

Read in the class index json file

In [3]:
import json
classlabel = []
for i_dict in range(len(CLASS_INDEX)):
classlabel.append(CLASS_INDEX[str(i_dict)][1])
print("N of class={}".format(len(classlabel)))

N of class=1000


Let's read in an image that contains both dog and cat. Clearly this image would be very confusing for a model trained with ImageNet, which often has a single object per image.

The goal of this exersise is to understand why VGG16 model makes classification decision.

In [4]:
from keras.preprocessing.image import load_img, img_to_array
plt.imshow(_img)
plt.show()


Let's predict the object class of this image, and show the top 5 predicted classes.

Unfortunately, the 2nd top predicted class does not make sense. Why bath_towel?? However, you see that some of the predicted classes are dogs and kind of makes sense:

Top 1 predicted class:     Pr(Class=redbone            [index=168])=0.360
Top 3 predicted class:     Pr(Class=bloodhound         [index=163])=0.076
Top 4 predicted class:     Pr(Class=basenji            [index=253])=0.042
Top 5 predicted class:     Pr(Class=golden_retriever   [index=207])=0.041
In [5]:
img               = img_to_array(_img)
img               = preprocess_input(img)
y_pred            = model.predict(img[np.newaxis,...])
class_idxs_sorted = np.argsort(y_pred.flatten())[::-1]
topNclass         = 5
for i, idx in enumerate(class_idxs_sorted[:topNclass]):
print("Top {} predicted class:     Pr(Class={:18} [index={}])={:5.3f}".format(
i + 1,classlabel[idx],idx,y_pred[0,idx]))

Top 1 predicted class:     Pr(Class=redbone            [index=168])=0.360
Top 2 predicted class:     Pr(Class=bath_towel         [index=434])=0.076
Top 3 predicted class:     Pr(Class=bloodhound         [index=163])=0.076
Top 4 predicted class:     Pr(Class=basenji            [index=253])=0.042
Top 5 predicted class:     Pr(Class=golden_retriever   [index=207])=0.041


## Image Specific Class Saliency¶

Saliency map for a given image array $\texttt{image}_0 \in R^{\texttt{height} \textrm{ x } \texttt{width} \textrm{ x } 3}$ is defined as the pixel-wise maximum of:

$$\bigg| \frac{ d \texttt{loss}_c }{ d \textrm{image} } \big\vert_{\textrm{image} = \texttt{image}_0} \bigg|$$

across channels.

### Interpretation¶

• If we can simplfy the CNN as a linear model of the form: $$\texttt{loss}_c = \boldsymbol{w}_c * \textrm{image} + \boldsymbol{b}_c$$ In this case, it is easy to see that the magnitude of elements of $\boldsymbol{w}_c$ defines the importance of the corresponding pixels of $\textrm{image}$ for class $c$. Indeed, the derivative that composes saliency map is $d \texttt{loss}_c/ d \textrm{image} \big\vert_{\textrm{image} = \texttt{image}_0} = \boldsymbol{w}_c$ where $\boldsymbol{w}_c,\textrm{image}, \boldsymbol{b}_c \in R^{\texttt{height} \textrm{ x } \texttt{width} \textrm{ x } 3}$
• class score derivative shows the magnitude of the derivative indicating which pixels need to be changed the least to affect the class score the most. One can expect that such pixels correspond to the object location in the image.
In [6]:
from vis.utils import utils
# Utility to search for layer index by name.
# Alternatively we can specify this as -1 since it corresponds to the last layer.
layer_idx = utils.find_layer_idx(model, 'predictions')

# The code above is essentially doing:
#for i, layer in enumerate(model.layers):
#    print("{:2.0f} {:10}".format(i, layer.name))
#    if "predictions" in layer.name:
#        layer_idx = i


Here, we need to do one trick: modify the final prediction class score to be unnormalized. i.e., do not use soft-max layer.

The reason is that maximizing an output node can be done by minimizing other outputs. Softmax is weird that way. It is the only activation that depends on other node output(s) in the layer.

In [7]:
# Swap softmax with linear
model.layers[layer_idx].activation = keras.activations.linear
model = utils.apply_modifications(model)

/Users/yumikondo/anaconda3/envs/explainableAI/lib/python3.5/site-packages/keras/engine/saving.py:269: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
warnings.warn('No training configuration found in save file: '

In [8]:
help(utils.apply_modifications)

Help on function apply_modifications in module vis.utils.utils:

apply_modifications(model, custom_objects=None)
Applies modifications to the model layers to create a new Graph. For example, simply changing
model.layers[idx].activation = new activation does not change the graph. The entire graph needs to be updated
with modified inbound and outbound tensors because of change in layer building function.

Args:
model: The keras.models.Model instance.

Returns:
The modified model with changes applied. Does not mutate the original model.



Let's visualize the saliency map using keras-vis.

In [9]:
from vis.visualization import visualize_saliency
class_idx = class_idxs_sorted[0]
layer_idx,
filter_indices = class_idx,
seed_input     = img[np.newaxis,...])


Visualization

In [10]:
def plot_map(grads):
fig, axes = plt.subplots(1,2,figsize=(14,5))
axes[0].imshow(_img)
axes[1].imshow(_img)
fig.colorbar(i)
plt.suptitle("Pr(class={}) = {:5.2f}".format(
classlabel[class_idx],
y_pred[0,class_idx]))

for class_idx in class_idxs_sorted[:topNclass]: