In this blog post, I will try to create a deep learning model that can color a gray scale image. I follow this great blog post Colorizing B&W; Photos with Neural Networks.

I will consider two example data to train a model:

Flickr8K data
Hunter x Hunter anime data

Flickr8K data is a famous public data in computer vision community, and it was also previously analyzed in my blog. The downloading process is described at Develop an image captioning deep learning model using Flickr 8K data. I will first use this standard data to validate the method with small data size (8,000 images).

Hunter x Hunter is a Japanese manga series written and illustrated by Yoshihiro Togashi. It has been serialized in Weekly Shōnen Jump magazine and journal since March 3, 1998. Hunter × Hunter was adapted into anime television series twice, in 1999 and in 2011. While the anime ttelevision series ended, the manga is on going in Shonen Jump magazine and the new manga is produced every week (except when the author "takes a break"). While anime is colored, the manga is not. So my motivation is that if we can train a deep learning model with the colored anime images, we may be able to color the manga and enjoy colored manga for free!! With this motivation, I will try to create a deep learning model using Hunter x Hunter anime data.

Reference¶

In [1]:

import matplotlib.pyplot as plt
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
import keras
import sys, time, os, warnings 
import numpy as np
import pandas as pd 
from collections import Counter 
warnings.filterwarnings("ignore")
print("python {}".format(sys.version))
print("keras version {}".format(keras.__version__)); del keras
print("tensorflow version {}".format(tf.__version__))
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.95
config.gpu_options.visible_device_list = "0"
set_session(tf.Session(config=config))

def set_seed(sd=123):
    from numpy.random import seed
    from tensorflow import set_random_seed
    import random as rn
    ## numpy random seed
    seed(sd)
    ## core python's random number 
    rn.seed(sd)
    ## tensor flow's random number
    set_random_seed(sd)

Using TensorFlow backend.

python 2.7.13 |Anaconda 4.3.1 (64-bit)| (default, Dec 20 2016, 23:09:15) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
keras version 2.1.3
tensorflow version 1.5.0

Flickr8k image¶

Load the images. Here, I load the image in LAB format, and my analysis is all based in LAB scale image. The only time I will transform the images into RGB will be when I want to plot the images using pyplot.imshow() function

To learn about the LAB format, please refer to my previous blog post Color space defenitions in python, RGB and LAB.

In [2]:

from keras.preprocessing.image import img_to_array, load_img
from skimage.color import rgb2lab, lab2rgb

target_size = (256,256,3)
X = []
dir_data = "../Flickr8k/Flicker8k_Dataset/"


for filenm in os.listdir(dir_data): 
    imgrgb = img_to_array(load_img(dir_data+filenm,target_size=target_size))/255.0
    imglab = rgb2lab(imgrgb)
    X.append(imglab)

In [3]:

X = np.array(X)
X.shape

Out[3]:

(8091, 256, 256, 3)

In [4]:

for i in range(X.shape[-1]):
    vec = X[:,:,:,i]
    print("MIN={:5.3f} MAX={:5.3f}".format(np.min(vec),np.max(vec)))

MIN=0.000 MAX=100.000
MIN=-86.183 MAX=98.233
MIN=-107.752 MAX=94.478

In [5]:

def standardizeLAB(X):
    ## Standardize the LAB
    standX = np.zeros(X.shape)
    ## standardized one takes values between 0 and 1
    standX[:,:,:,0] =  X[:,:,:,0]/100.0
    ## standardized one takes values between -1 and 1
    standX[:,:,:,1:] = X[:,:,:,1:]/128.0
    return(standX)

standX = standardizeLAB(X)
del X
print(standX.shape)

(8091, 256, 256, 3)

Devide the standardized image between training and testing

In [6]:

split = int(0.95*len(standX))
Xtrain = standX[:split]
Xtest = standX[split:]

Define model¶

In [7]:

from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
from keras.layers import Conv2D, UpSampling2D, InputLayer, Conv2DTranspose
from keras.models import Sequential

def define_model():
    #Design the neural network
    model = Sequential()
    model.add(InputLayer(input_shape=(256, 256, 1)))
    model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
    model.add(Conv2D(64, (3, 3), activation='relu', padding='same', strides=2))
    model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
    model.add(Conv2D(128, (3, 3), activation='relu', padding='same', strides=2))
    model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
    model.add(Conv2D(256, (3, 3), activation='relu', padding='same', strides=2))
    model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
    model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
    model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
    model.add(UpSampling2D((2, 2)))
    model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
    model.add(UpSampling2D((2, 2)))
    model.add(Conv2D(32, (3, 3), activation='relu', padding='same'))
    model.add(Conv2D(2, (3, 3), activation='tanh', padding='same'))
    #model.add(Conv2D(2, (3, 3), padding='same'))
    model.add(UpSampling2D((2, 2)))
    model.compile(optimizer='rmsprop', loss='mse')
    return(model)
# Finish model


# Image transformer
datagen = ImageDataGenerator(
        shear_range=0.2,
        zoom_range=0.2,
        rotation_range=20,
        horizontal_flip=True)
model = define_model()
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 256, 256, 1)       0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 256, 256, 64)      640       
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 128, 128, 64)      36928     
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 128, 128, 128)     73856     
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 64, 64, 128)       147584    
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 64, 64, 256)       295168    
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 32, 32, 256)       590080    
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 32, 32, 512)       1180160   
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 32, 32, 256)       1179904   
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 32, 32, 128)       295040    
_________________________________________________________________
up_sampling2d_1 (UpSampling2 (None, 64, 64, 128)       0         
_________________________________________________________________
conv2d_10 (Conv2D)           (None, 64, 64, 64)        73792     
_________________________________________________________________
up_sampling2d_2 (UpSampling2 (None, 128, 128, 64)      0         
_________________________________________________________________
conv2d_11 (Conv2D)           (None, 128, 128, 32)      18464     
_________________________________________________________________
conv2d_12 (Conv2D)           (None, 128, 128, 2)       578       
_________________________________________________________________
up_sampling2d_3 (UpSampling2 (None, 256, 256, 2)       0         
=================================================================
Total params: 3,892,194
Trainable params: 3,892,194
Non-trainable params: 0
_________________________________________________________________

Training starts¶

Notice that I removed the rgb2lab line from image_a_b_gen n that existed in Colorizing B&W; Photos with Neural Networks. The conversion is not necessary because our Xtrain is already converted to lab format. Also I found that removing this line from the image_a_b_gen made the training x10 faster.

In [8]:

# Generate training data
batch_size = 128
def image_a_b_gen(Xtrain, batch_size):
    for batch in datagen.flow(Xtrain, batch_size=batch_size):
        X_batch = batch[:,:,:,[0]]
        Y_batch = batch[:,:,:,1:] 
        yield (X_batch, Y_batch)

## create a validation data    
Ntrain = int(Xtrain.shape[0]*0.8)    
X_tr = Xtrain[:Ntrain]
X_val = Xtrain[Ntrain:]

hist = model.fit_generator(image_a_b_gen(X_tr, batch_size), 
                           verbose=2,
                           validation_data = (X_val[:,:,:,[0]],
                                              X_val[:,:,:,1:]),
                           steps_per_epoch=100, epochs=5)

Epoch 1/5
 - 143s - loss: 0.0299 - val_loss: 0.0141
Epoch 2/5
 - 120s - loss: 0.0146 - val_loss: 0.0139
Epoch 3/5
 - 124s - loss: 0.0140 - val_loss: 0.0139
Epoch 4/5
 - 122s - loss: 0.0140 - val_loss: 0.0135
Epoch 5/5
 - 123s - loss: 0.0137 - val_loss: 0.0131

plot the loss¶

In [9]:

for key in hist.history.keys():
    plt.plot(hist.history[key],label=key)
plt.legend()
plt.show()

Evaluate the model performance using the test set¶

In [10]:

Ypred = model.predict(Xtest[:,:,:,[0]])
print("testing MSE={:4.3f}".format(np.mean((Ypred - Xtest[:,:,:,2:])**2)))

testing MSE=0.020

Plot the predicated testing image and true colored testing image¶

Unfortunately, almost all the images are brown-ish. We need to increase the training data size, investigate different models, or consider narrower space of images. Nevertheless, the model seems to learn

grass color is green
sky is blue
lake is blue

In [11]:

from copy import copy


def plot_gray_predicted_true_images(Xtest,Ypred,index,target_size):
    Npic = len(index)
    fig = plt.figure(figsize=(10,Npic*3))
    count = 1
    for i in index:
        img = copy(Xtest[i])
        cur_pred = np.zeros(target_size)
        cur_gray = np.zeros(target_size)

        cur_pred[:,:,0] = img[:,:,0]*100
        cur_gray[:,:,0] = img[:,:,0]*100
        cur_pred[:,:,1:] = Ypred[i]* 128

        rgb_cur_pred = lab2rgb(cur_pred)
        rgb_cur_gray = lab2rgb(cur_gray)

        ax = fig.add_subplot(Npic,3,count)
        ax.imshow(rgb_cur_gray)
        ax.set_title("ID={} original (gray scale)".format(i))
        ax.axis("off")
        count += 1

        ax = fig.add_subplot(Npic,3,count)
        ax.imshow(rgb_cur_pred)
        ax.axis("off")
        ax.set_title("predicted")
        count += 1

        ## create original image
        img_o = np.zeros(target_size)
        img_o[:,:,0] = img[:,:,0]*100
        img_o[:,:,1:] = img[:,:,1:]*128

        ax = fig.add_subplot(Npic,3,count)
        rgb = lab2rgb(img_o)
        ax.imshow(rgb)
        ax.set_title("original (color)")
        ax.axis("off")
        count += 1
    plt.show()

Npic = 20

exampleIDs = [132,288,47,169,85,191,37,147]
randomIDs = list(np.random.choice(range(Xtest.shape[0]),Npic-len(exampleIDs)))
index = exampleIDs + randomIDs    
plot_gray_predicted_true_images(Xtest,Ypred,index,target_size)

In [12]:

del standX

Hunter x Hunter¶

The Hunter x Hunter images are extracted from Google Image search, and the procedure is described in previous post Download all images from Google image search query using python.

I used several queries for downloading the data:

image - Hunter x Hunter
image - Hunter x Hunter anime
image - Hunter x Hunter color
image - Hunter x Hunter gon
image - Hunter x Hunter killua
image - Hunter x Hunter aruka
image - Hunter x Hunter kurapika

gon, killua, aruka and kurapika are character's names. The text files containing URL links of each image are available in my Github.

In [13]:

from keras.preprocessing.image import img_to_array, load_img
from skimage.color import rgb2lab, lab2rgb

target_size = (256,256,3)

## try except is incdlued because 
dir_data = "../HunterHunter/image/anime/"
X = []
count = 0
for folder in os.listdir(dir_data):
    for image in os.listdir(dir_data + folder):
        try:
            imgrgb = img_to_array(load_img(dir_data+folder + "/" + image,target_size=target_size))/255.0
            imglab = rgb2lab(imgrgb)
            X.append(imglab)
            count += 1
        except Exception as e:
            pass
X = np.array(X)
print("The total number of images {}".format(X.shape[0]))

The total number of images 1470

Standardize the LAB data¶

In [14]:

standX = standardizeLAB(X)
del X
print(standX.shape)

(1470, 256, 256, 3)

Training the model¶

The process is the same as the previous analysis with Flikr8K data.

In [15]:

#Split between training and testing data
split = int(0.95*len(standX))
Xtrain = standX[:split]
Xtest = standX[split:]


## create a validation data    
Ntrain = int(Xtrain.shape[0]*0.8)    
X_tr = Xtrain[:Ntrain]
X_val = Xtrain[Ntrain:]

model = define_model()
batch_size = 128
## we will use the inital weights from the Flickr8K analysis
hist = model.fit_generator(image_a_b_gen(X_tr, batch_size), 
                           verbose=2,
                           validation_data = (X_val[:,:,:,[0]],
                                              X_val[:,:,:,1:]),
                           steps_per_epoch=100, epochs=5)

Epoch 1/5
 - 126s - loss: 0.0722 - val_loss: 0.0345
Epoch 2/5
 - 124s - loss: 0.0342 - val_loss: 0.0335
Epoch 3/5
 - 122s - loss: 0.0335 - val_loss: 0.0327
Epoch 4/5
 - 123s - loss: 0.0330 - val_loss: 0.0332
Epoch 5/5
 - 123s - loss: 0.0327 - val_loss: 0.0319

Plot the validation loss¶

In [16]:

for key in hist.history.keys():
    plt.plot(hist.history[key],label=key)
plt.legend()
plt.show()

Model validation using test set¶

In [17]:

Ypred = model.predict(Xtest[:,:,:,[0]])
print("testing MSE={:4.3f}".format(np.mean((Ypred - Xtest[:,:,:,2:])**2)))

testing MSE=0.048

Plot the example images from test set¶

Again, the model is not doing very good job coloring images. This time, all the images are purple-ish.
The plot also show that there are some non-anime related images, indicating that I could do manual cleaning of the images.

Nevertheless, there are some good things too:

Killua's hair is colored purple-ish white correctly.
Gon's face is colored with a skin color in ID=46.
Super-power-looking-back ground colors seems appropriate.

In [18]:

Nsample = 30
index = [46,69,45, 18,22, 25, 0, 56, 10, 50] 
plot_gray_predicted_true_images(Xtest,Ypred,index,target_size)

Can I color manga?¶

Well, my model was not doing the best job coloring anime but what about coloring manga? Let's give it a try.

Load 5 Hunter x Hunter manga images.

In [19]:

target_size = (256,256,3)

## try except is incdlued because 
dir_data = "../HunterHunter/manga/Hunter x Hunter manga/"
X = []
count = 0
for image in ['image00004.jpg',
              'image00005.jpg',
              'image00006.jpg',
              'image00007.jpg',
              'image00008.jpg']:
        try:
            imgrgb = img_to_array(load_img(dir_data+ "/" + image,target_size=target_size))/255.0
            imglab = rgb2lab(imgrgb)
            X.append(imglab)
            count += 1
        except Exception as e:
            pass
X = np.array(X)
print("The total number of images {}".format(X.shape[0]))        
## standardization
standX = standardizeLAB(X)
#del X
print(standX.shape)

The total number of images 5
(5, 256, 256, 3)

Use the previously built model with anime to predict the color of manga.¶

In [20]:

Ypred = model.predict(standX[:,:,:,[0]])

Plot how it looks like¶

Ah, we need to train model with more relevant images! We need a colored manga rather than anime to train a model for this purpose.

In [21]:

index = range(standX.shape[0])
plot_gray_predicted_true_images(standX,Ypred,index,target_size)

Next step:¶

Consider incorporating pre-trained network.

Yumi's Blog

Color gray scale images and manga using deep learning