Everyone has been lately talking about Generative Adversarial Networks, or more famously known as GAN. My colleagues, interview candidates, my manager, conferences literally EVERYONE. It seems like the GANs becomes required knowledge for data scientists in bay area. It also seems that GANs are cool: GANs can generate new celebility face images, generate creative arts or generate the next frame of the video images. AI can think by itself with the power of GAN.
In this blog I will learn what's so great about GAN. The gif above is the outputed images from my first GAN. The reproduced images are blurr and seem to need more training; I only trained my GAN for about 1 hour. Nevertheless, this is a good start.
My focus in this blog will be on its simple implementation rather than its theoretical degails. I will use famous and popular celebA data to train GANs, and generate celebrity face images.
I got lots of good idea about its simple implemenation from GAN: A Beginner’s Guide to Generative Adversarial Networks so I highly recommend you to read this great blog first.
Reference¶
## load modules
import matplotlib.pyplot as plt
import os, time
import numpy as np
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.95
config.gpu_options.visible_device_list = "1"
set_session(tf.Session(config=config))
Load celebA data.¶
As my previous post shows, celebA contains over 202,599 images. I will use 200,000 images to train GANs. The original image is of the shape (218, 178, 3). All images are resized to smaller shape for the sake of easier computation.
I am shrinking the image size pretty small here because otherwise, GAN requires lots of computation time.
dir_data = "data/img_align_celeba/"
Ntrain = 200000
Ntest = 100
nm_imgs = np.sort(os.listdir(dir_data))
## name of the jpg files for training set
nm_imgs_train = nm_imgs[:Ntrain]
## name of the jpg files for the testing data
nm_imgs_test = nm_imgs[Ntrain:Ntrain + Ntest]
img_shape = (32, 32, 3)
def get_npdata(nm_imgs_train):
X_train = []
for i, myid in enumerate(nm_imgs_train):
image = load_img(dir_data + "/" + myid,
target_size=img_shape[:2])
image = img_to_array(image)/255.0
X_train.append(image)
X_train = np.array(X_train)
return(X_train)
X_train = get_npdata(nm_imgs_train)
print("X_train.shape = {}".format(X_train.shape))
X_test = get_npdata(nm_imgs_test)
print("X_test.shape = {}".format(X_test.shape))
Plot the resized input images¶
I hope that our generators can generate images similar to these!
fig = plt.figure(figsize=(30,10))
nplot = 7
for count in range(1,nplot):
ax = fig.add_subplot(1,nplot,count)
ax.imshow(X_train[count])
plt.show()
Define GAN¶
GAN contains two networks which has two competing objectives:
Generator: the generator generates new data instances that are "similar" to the training data, in our case celebA images. Generator takes random latent vector and output a "fake" image of the same size as our reshaped celebA image.
Discriminator: the discriminator evaluate the authenticity of provided images; it classifies the images from the generator and the original image. Discriminator takes true of fake images and output the probability estimate ranging between 0 and 1.
\begin{array}{rcll} \textrm{Generator(latent)} &\rightarrow& \textrm{image}\\ \textrm{Discriminator(image)}&\rightarrow& \textrm{0 (fake) /1 (true)} \end{array}
Define generator¶
import numpy as np
from keras import layers, models
from keras.optimizers import Adam
## optimizer
#optimizer = Adam(0.0002, 0.5)
optimizer = Adam(0.00007, 0.5)
def build_generator(img_shape, noise_shape = (100,)):
'''
noise_shape : the dimension of the input vector for the generator
img_shape : the dimension of the output
'''
## latent variable as input
input_noise = layers.Input(shape=noise_shape)
d = layers.Dense(1024, activation="relu")(input_noise)
d = layers.Dense(1024, activation="relu")(input_noise)
d = layers.Dense(128*8*8, activation="relu")(d)
d = layers.Reshape((8,8,128))(d)
d = layers.Conv2DTranspose(128, kernel_size=(2,2) , strides=(2,2) , use_bias=False)(d)
d = layers.Conv2D( 64 , ( 1 , 1 ) , activation='relu' , padding='same', name="block_4")(d) ## 16,16
d = layers.Conv2DTranspose(32, kernel_size=(2,2) , strides=(2,2) , use_bias=False)(d)
d = layers.Conv2D( 64 , ( 1 , 1 ) , activation='relu' , padding='same', name="block_5")(d) ## 32,32
if img_shape[0] == 64:
d = layers.Conv2DTranspose(32, kernel_size=(2,2) , strides=(2,2) , use_bias=False)(d)
d = layers.Conv2D( 64 , ( 1 , 1 ) , activation='relu' , padding='same', name="block_6")(d) ## 64,64
img = layers.Conv2D( 3 , ( 1 , 1 ) , activation='sigmoid' , padding='same', name="final_block")(d) ## 32, 32
model = models.Model(input_noise, img)
model.summary()
return(model)
## Set the dimension of latent variables to be 100
noise_shape = (100,)
generator = build_generator(img_shape, noise_shape = noise_shape)
generator.compile(loss='binary_crossentropy', optimizer=optimizer)
Take a look at the generatied images BEFORE any training.¶
As expectedly, the image is nothing like celebA. Our generator knows nothing about it and it outputs some random noise in a weak attempt to trick the discriminator. Let's see how much generator can learn from the celebA training data to generate "fake celebA images"!
def get_noise(nsample=1, nlatent_dim=100):
noise = np.random.normal(0, 1, (nsample,nlatent_dim))
return(noise)
def plot_generated_images(noise,path_save=None,titleadd=""):
imgs = generator.predict(noise)
fig = plt.figure(figsize=(40,10))
for i, img in enumerate(imgs):
ax = fig.add_subplot(1,nsample,i+1)
ax.imshow(img)
fig.suptitle("Generated images "+titleadd,fontsize=30)
if path_save is not None:
plt.savefig(path_save,
bbox_inches='tight',
pad_inches=0)
plt.close()
else:
plt.show()
nsample = 4
noise = get_noise(nsample=nsample, nlatent_dim=noise_shape[0])
plot_generated_images(noise)
Define discriminator¶
def build_discriminator(img_shape,noutput=1):
input_img = layers.Input(shape=img_shape)
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same', name='block1_conv1')(input_img)
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same', name='block1_conv2')(x)
x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)
x = layers.Conv2D(64, (3, 3), activation='relu', padding='same', name='block2_conv1')(x)
x = layers.Conv2D(64, (3, 3), activation='relu', padding='same', name='block2_conv2')(x)
x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(x)
x = layers.Conv2D(128, (3, 3), activation='relu', padding='same', name='block4_conv1')(x)
x = layers.Conv2D(128, (3, 3), activation='relu', padding='same', name='block4_conv2')(x)
x = layers.MaxPooling2D((2, 2), strides=(1, 1), name='block4_pool')(x)
x = layers.Flatten()(x)
x = layers.Dense(1024, activation="relu")(x)
out = layers.Dense(noutput, activation='sigmoid')(x)
model = models.Model(input_img, out)
return model
discriminator = build_discriminator(img_shape)
discriminator.compile(loss = 'binary_crossentropy',
optimizer = optimizer,
metrics = ['accuracy'])
discriminator.summary()
Combined model¶
In 32×32 8-bit RGB images, there are $2^{3x8x32x32}=2^{24576}$ possible arrangements of the pixel values in those images. The number of parameters of Generator and Dicriminator are together much less than this number.
- noise -> generator -> discriminator
This combined model will share the same weights as discriminator and generator.
z = layers.Input(shape=noise_shape)
img = generator(z)
# For the combined model we will only train the generator
discriminator.trainable = False
# The valid takes generated images as input and determines validity
valid = discriminator(img)
# The combined model (stacked generator and discriminator) takes
# noise as input => generates images => determines validity
combined = models.Model(z, valid)
combined.compile(loss='binary_crossentropy', optimizer=optimizer)
combined.summary()
Training¶
When you train the discriminator, hold the generator values constant; and when you train the generator, hold the discriminator constant.
def train(models, X_train, noise_plot, dir_result="/result/", epochs=10000, batch_size=128):
'''
models : tuple containins three tensors, (combined, discriminator, generator)
X_train : np.array containing images (Nsample, height, width, Nchannels)
noise_plot : np.array of size (Nrandom_sample_to_plot, hidden unit length)
dir_result : the location where the generated plots for noise_plot are saved
'''
combined, discriminator, generator = models
nlatent_dim = noise_plot.shape[1]
half_batch = int(batch_size / 2)
history = []
for epoch in range(epochs):
# ---------------------
# Train Discriminator
# ---------------------
# Select a random half batch of images
idx = np.random.randint(0, X_train.shape[0], half_batch)
imgs = X_train[idx]
noise = get_noise(half_batch, nlatent_dim)
# Generate a half batch of new images
gen_imgs = generator.predict(noise)
# Train the discriminator q: better to mix them together?
d_loss_real = discriminator.train_on_batch(imgs, np.ones((half_batch, 1)))
d_loss_fake = discriminator.train_on_batch(gen_imgs, np.zeros((half_batch, 1)))
d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)
# ---------------------
# Train Generator
# ---------------------
noise = get_noise(batch_size, nlatent_dim)
# The generator wants the discriminator to label the generated samples
# as valid (ones)
valid_y = (np.array([1] * batch_size)).reshape(batch_size,1)
# Train the generator
g_loss = combined.train_on_batch(noise, valid_y)
history.append({"D":d_loss[0],"G":g_loss})
if epoch % 100 == 0:
# Plot the progress
print ("Epoch {:05.0f} [D loss: {:4.3f}, acc.: {:05.1f}%] [G loss: {:4.3f}]".format(
epoch, d_loss[0], 100*d_loss[1], g_loss))
if epoch % int(epochs/100) == 0:
plot_generated_images(noise_plot,
path_save=dir_result+"/image_{:05.0f}.png".format(epoch),
titleadd="Epoch {}".format(epoch))
if epoch % 1000 == 0:
plot_generated_images(noise_plot,
titleadd="Epoch {}".format(epoch))
return(history)
dir_result="./result_GAN/"
try:
os.mkdir(dir_result)
except:
pass
start_time = time.time()
_models = combined, discriminator, generator
history = train(_models, X_train, noise, dir_result=dir_result,epochs=20000, batch_size=128*8)
end_time = time.time()
print("-"*10)
print("Time took: {:4.2f} min".format((end_time - start_time)/60))
Loss over epochs¶
Notice that the losses from dicriminator are not dicreasing over epochs, which makes sense because the discriminator's good classification performance is always challenged by the improved "fake images" from the generator.
import pandas as pd
hist = pd.DataFrame(history)
plt.figure(figsize=(20,5))
for colnm in hist.columns:
plt.plot(hist[colnm],label=colnm)
plt.legend()
plt.ylabel("loss")
plt.xlabel("epochs")
plt.show()
Finally create gif of the generated images at every few epochs¶
def makegif(dir_images):
import imageio
filenames = np.sort(os.listdir(dir_images))
filenames = [ fnm for fnm in filenames if ".png" in fnm]
with imageio.get_writer(dir_images + '/image.gif', mode='I') as writer:
for filename in filenames:
image = imageio.imread(dir_images + filename)
writer.append_data(image)
os.remove(dir_images + filename)
makegif(dir_result)
GAN Auto-Encoder¶
GAN can transform the latent variable to an image. What about the other way? Can I map an image to a latent variable? If we can do such mapping, I can do interesting things e.g., the average latent variable values of female images may be decoded into an average female images.
Photo Editing Generative Adversarial Networks Part 1 creates GAN-based encoder-decoder network: separately train an encoder while using generator as a fixed decoder. For the encoder network, I will use the discriminator with the number of output neurons in the last layer set to 100. This way, I can reversed the order of things in my GAN and created a GAN Auto-encoder.
In the encoder network, the trainable parameters are only set to the ones from the encoder.
img_in = layers.Input(shape=img_shape)
# discriminator with the final output layer = 100 network as encoder
discriminator_encoder = build_discriminator(img_shape,100)
# discriminator as encoder
encoder = discriminator_encoder(img_in)
# generator as decoder
generator.trainable = False
img_out = generator(encoder)
encoder_decoder = models.Model(img_in,img_out)
encoder_decoder.compile(loss='mse', optimizer=optimizer)
encoder_decoder.summary()
Train encoders¶
start_time = time.time()
history_ed = encoder_decoder.fit(X_train,X_train,
validation_data=(X_test,X_test),
epochs=10,verbose=2)
end_time = time.time()
print("-"*10)
print("Time took: {:4.2f} min".format((end_time - start_time)/60))
Loss over epochs¶
plt.figure(figsize=(10,5))
for colnm in history_ed.history.keys():
plt.plot(history_ed.history[colnm],label=colnm)
plt.legend()
plt.show()
Check the model performance of the encoder-decoder network using testing data¶
# discriminator_encoder.compile(loss='mse', optimizer=optimizer)
X_pred = encoder_decoder.predict(X_test)
## z_pred = discriminator_encoder.predict(X_test)
Plot the original image and reproduced image using encoder¶
Some reproduced images are somewhat similar to the original images... But I clearly need more training.
Ntest = 10
for irow in range(Ntest):
fig = plt.figure(figsize=(10,5))
ax = fig.add_subplot(1,2,1)
ax.imshow(X_test[irow])
ax.set_title("original image")
ax = fig.add_subplot(1,2,2)
ax.imshow(X_pred[irow])
ax.set_title("encoded image")
plt.show()