Generating images with Deep Learning
Learn how to use AI to generate new images such as faces or art
We are in the era of Artificial intelligence, which is progressing at an incredible pace. It would be very just to say that Artificial Intelligence is now automating artificial intelligence. This article explores the details of the stunning fact I just mentioned.
This incredible pace in Artificial Intelligence progress, which, according to Elon Musk, is close to exponential, has also seen some mind-blowing breakthroughs in computer vision. By creating an algorithm named GANs, short for Generative Adversarial Networks, a renowned Deep Learning researcher Ian Goodfellow transformed traditional computer vision into the New Computer Vision. Well, I coined these names “Traditional” and “New”, not the CV community, seeing the drastic shift in producing data for several computer vision problems.
Traditional and New computer vision
The traditional computer vision involved creating large datasets either by downloading, cleaning, rendering, and storing them into the databases or manually capturing the pictures, making the videos, and making them able to be used by the network. This technique posed an overwhelming challenge to the researchers and developers given the constraints, especially those of time, resources, and substandard quality of data produced.
Indeed, the New Computer Vision has been a sigh of relief, reducing the huge burden of collecting optimum datasets hitherto. Generative Adversarial Networks (GANs) are now used to generate data of the same characteristics as the limited data available. Thus, relieving the researchers of the hectic routine of collecting and managing data.
How GANs work?
The idea of this buzzword GANs is all about two Neural Networks competing against each other. These competing neural networks perform the assigned tasks in such a way as to learn the probability distribution present in the available data. Based on such probability distribution, the realistic-looking data is generated with the same characteristics as the training data.
Two Competing Neural Networks
These competing networks are known as Generator and Discriminator. The Generator network learns the patterns in the training data and generates the images. While, the Discriminator checks the authenticity of the generated images i.e. it decides whether the generated image belongs to the training set or not. In simple words, it checks whether the generated image is ‘Real’ or ‘Fake’.
Generator
The Generator produces a new fake image when given a random set of values or, say noise, after performing a series of non-linear computations and passes it to the Discriminator. It does so as it hopes to be declared authentic by the Discriminator. Putting it simple, the goal of the Generator is to create images to lie without being caught i.e. befool the Discriminator.
Discriminator
The goal of the Discriminator is to process the images coming from the Generator (fake image as shown below) and identify them as fake. Its role is that of a binary classifier. It takes two inputs; a real image (coming from training data) and the generated image coming from the Generator. It compares them and tells either they are coming from the same distribution or not. ‘Real’ means they are from the same distribution whereas, ‘Fake’ tells the difference of the distribution they belong to.
Training and Convergence of GAN
The training of the GAN proceeds in an alternate fashion, that is, the Generator remains constant while Discriminator is learning to classify the generated image truly as ‘Real’ or ‘Fake,’ i.e. catching the Generator’s flaws and Discriminator remains constant while Generator is learning to befool the Discriminator (Generator trying to get its fake generated image classified as real by the Discriminator). This back-and-forth training allows the GAN model to converge, which otherwise becomes intractable. Such training helps overcome the no-win situation of excellent trained Generator and poorly trained Discriminator or vice-versa, which otherwise is inevitable.
Improvement of the Generator’s performance worsens the Discriminator’s performance because it cannot easily distinguish between real and fake. If the Generator works perfectly, the Discriminator has a 50% accuracy. In effect, the Discriminator does nothing different than flipping a coin to make its prediction.
This makes convergence impossible as a whole: the feedback of the Discriminator becomes very less meaningful for the Generator. If training GAN continues after the Discriminator achieves a point of giving completely random feedback, then the Generator begins to train on garbage (misleading feedback), and its own quality collapses.
So, alternate training is of crucial significance for the convergence of GANs.
Mathematics behind GANs
Now we dig deeper into the mathematical foundation of the GANs.
A GAN comprises of two neural networks: a generator G and a discriminator D, that are competing against each other as, together, they learn the unknown distribution of the training dataset. Our obvious goal is that Generator generates indistinguishable images from those of the training data.
Thus, the Generator’s weights should be such that the Generator generates fake images that the Discriminator could not identify to be fake. This makes this optimization a min-max optimization problem where we want weights of the Generator which minimize the rate (cost function J(D), given below) at which the Discriminator classifies the real and fake samples correctly. And we want the weights of the Discriminator which maximize this rate. Since the binary-class classification is used in this case, we use the binary-cross-entropy function as our cost function.
The first term in the cost function J(D) indicates the real data fed into the Discriminator, which would want to maximize the log probability of predicting one, showing the data is real. The second term indicates the fake images generated by Generator G. Here; the Discriminator wants to maximize the log probability of predicting zero, showing the data is fake. On the other hand, the Generator seeks to minimize the log probability of the Discriminator being correct. The equilibrium point of this trade-off is the solution to this optimization problem i.e. the saddle point of the discriminator loss.
D()
is the probability that the given image belongs to the training data X
. For the Generator; we want to minimize log(1-D(G(z))
, i.e. when the D(G(z))
is high, then D
thinks G(z)
is X
, and this makes 1-D(G(z))
very low. We want to minimize this, which this even lower. For the Discriminator, we want to maximize D(X)
and (1-D(G(z)))
. So the optimal state of D
will be P(x)=0.5
. However, generator G
should train such that it produces the results for the discriminator D
so that D
fails to differentiate between z
and X
.
The reason why we call it a min-max optimization lies in the fact that the Discriminator tries to maximize the objective while the Generator tries to minimize it. Due to this minimizing and maximizing, we say it is min-max. They both learn together by alternating gradient descent.
Implementation of GANs
This section will implement the GAN in coding to see how it works ahead of the theoretical base. We will use the TensorFlow library for this. Versions being used are TensorFlow v2.3.0 and Keras v2.4.3.
You can write the code yourself, or you can access the full code on Google colab .
Importing Packages
A few libraries are required, let’s get them all
import os
import time
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers
import argparse
from IPython import display
import matplotlib.pyplot as plt
# %matplotlib inline
from tensorflow import keras
Data Loading and Preprocessing
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1).astype('float32')
x_train = (x_train - 127.5) / 127.5 # Normalize the images to [-1, 1]
# Batch and shuffle the data
train_dataset = tf.data.Dataset.from_tensor_slices(x_train).\
shuffle(60000).batch(args.batch_size)
We use the tf_keras
datasets module to load the Fashion MNIST dataset. This module loads the data off the shelf. Since there is no need of Labels to solve this problem, we only use the training images x_train. We reshape the images and cast them to float32 (the data is by default in uint8 format).
Then, we normalize the data from [0, 255]
to [-1, 1]
. Finally, we build the TensorFlow input pipeline. In short, tf.data.Dataset.from_tensor_slices
is fed with the training data, shuffled and sliced into tensors, allowing us to access tensors of specified batch size during training. The buffer size parameter in the shuffle affects the randomness of the shuffle.
Creating Generator Network
def generator(image_dim):
inputs = keras.Input(shape=(100,), name='input_layer')
x = layers.Dense(128, kernel_initializer=tf.keras.initializers.he_uniform, name='dense_1')(inputs)
#print(x.dtype)
x = layers.LeakyReLU(0.2, name='leaky_relu_1')(x)
x = layers.Dense(256, kernel_initializer=tf.keras.initializers.he_uniform, name='dense_2')(x)
x = layers.BatchNormalization(momentum=0.1, epsilon=0.8, name='bn_1')(x)
x = layers.LeakyReLU(0.2, name='leaky_relu_2')(x)
x = layers.Dense(512, kernel_initializer=tf.keras.initializers.he_uniform, name='dense_3')(x)
x = layers.BatchNormalization(momentum=0.1, epsilon=0.8, name='bn_2')(x)
x = layers.LeakyReLU(0.2, name='leaky_relu_3')(x)
x = layers.Dense(1024, kernel_initializer=tf.keras.initializers.he_uniform, name='dense_4')(x)
x = layers.BatchNormalization(momentum=0.1, epsilon=0.8, name='bn_3')(x)
x = layers.LeakyReLU(0.2, name='leaky_relu_4')(x)
x = layers.Dense(image_dim, kernel_initializer=tf.keras.initializers.he_uniform, activation='tanh', name='dense_5')(x)
outputs = tf.reshape(x, [-1, 28, 28, 1], name='Reshape_Layer')
model = tf.keras.Model(inputs, outputs, name="Generator")
return model
We feed the Generator with a 100-D noise vector sampled from a normal distribution. next we define the input layer, with shape as (100,). In TensorFlow, the default weight initializer for the linear layers is he_uniform.
The momentum value of the batch norm layers is changed to 0.1 (default is 0.99).
Finally we reshape the 784-D tensor to (Batch Size, 28, 28, 1) using tf.reshape in which the first parameter is the input tensor, and the second parameter is the new shape of the tensor. Finally, we create the Model by passing the generator function’s input and output layers.
Creating Discriminator Network
def discriminator():
inputs = keras.Input(shape=(28,28,1), name='input_layer')
input = tf.reshape(inputs, [-1, 784], name='reshape_layer')
x = layers.Dense(512, kernel_initializer=tf.keras.initializers.he_uniform, name='dense_1')(input)
x = layers.LeakyReLU(0.2, name='leaky_relu_1')(x)
x = layers.Dense(256, kernel_initializer=tf.keras.initializers.he_uniform, name='dense_2')(x)
x = layers.LeakyReLU(0.2, name='leaky_relu_2')(x)
outputs = layers.Dense(1, kernel_initializer=tf.keras.initializers.he_uniform, activation='sigmoid', name='dense_3') (x)
model = tf.keras.Model(inputs, outputs, name="Discriminator")
return model
The Discriminator is a binary classifier consisting only of fully connected layers. So, the Discriminator expects a tensor of shape (Batch Size, 28, 28, 1). But the discriminator function consists of only dense layers. Therefore, we reshape the tensor to a vector of shape (Batch Size, 784). The final layer has the sigmoid activation function, which brings the output value between 0 (fake) and 1 (real).
Loss Function
binary_cross_entropy = tf.keras.losses.BinaryCrossentropy()
This is the binary-cross-entropy loss.
Below are the Generator’s and Discriminator’s individual losses.
Generator Loss
def generator_loss(fake_output):
gen_loss = binary_cross_entropy(tf.ones_like(fake_output), fake_output)
return gen_loss
Discriminator Loss
def discriminator_loss(real_output, fake_output):
real_loss = binary_cross_entropy(tf.ones_like(real_output), real_output)
fake_loss = binary_cross_entropy(tf.zeros_like(fake_output), fake_output)
total_loss = real_loss + fake_loss
return total_loss
Optimizer
generator_optimizer = tf.keras.optimizers.Adam(learning_rate = args.lr, beta_1 = args.b1, beta_2 = args.b2 )
discriminator_optimizer = tf.keras.optimizers.Adam(learning_rate = args.lr, beta_1 = args.b1, beta_2 = args.b2 )
We use Adam Optimizer to optimize both the Generator and the Discriminator, which takes two arguments:
- the learning rate of
2e-4
. - Beta coefficients:
b1
&b2
.
These compute the running averages of gradients during backpropagation.
Training Loop (all the functions combined for training GAN)
@tf.function
def train_step(images):
noise = tf.random.normal([args.batch_size, args.latent_dim])
with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
generated_images = generator(noise, training=True)
real_output = discriminator(images, training=True)
fake_output = discriminator(generated_images, training=True)
gen_loss = generator_loss(fake_output)
disc_loss = discriminator_loss(real_output, fake_output)
gradients_of_gen = gen_tape.gradient(gen_loss, generator.trainable_variables) # computing the gradients
gradients_of_disc = disc_tape.gradient(disc_loss, discriminator.trainable_variables) # computing the gradients
generator_optimizer.apply_gradients(zip(gradients_of_gen, generator.trainable_variables)) # updating generator parameter
discriminator_optimizer.apply_gradients(zip(gradients_of_disc,discriminator.trainable_variables)) # updating discriminator parameter
The train_step function is the core of the whole GAN training. Because in this, we combine all the training functions defined above.
@tf.function
compiles the train_step function into a callable TensorFlow graph. Also, reduces the training time Following steps are involved in the whole training process:
- First, we sample the noise from a normal distribution and input it to the Generator.
- The Generator produces fake images, which is fed into the Discriminator. The Discriminator is also given the real images.
- The Discriminator classifies the images (coming from the Generator) as real (drawn from the training set) or fake (produced by the Generator)
- For each of these models, the loss is calculated: gen_loss and disc_loss.
- After computing the gradients, the generator and discriminator parameters are updated using the Adam optimizer.
Training
def train(dataset, epochs):
for epoch in range(epochs):
start = time.time()
i = 0
D_loss_list, G_loss_list = [], []
for image_batch in dataset:
i += 1
train_step(image_batch)
display.clear_output(wait=True)
generate_and_save_images(generator,
epoch + 1,
seed)
# Save the model every 15 epochs
if (epoch + 1) % 15 == 0:
checkpoint.save(file_prefix = checkpoint_prefix)
print ('Time for epoch {} is {} sec'.format(epoch + 1, time.time()-start))
# Generate after the final epoch
display.clear_output(wait=True)
generate_and_save_images(generator,
epochs,
seed)
Finally, here comes the time where we can sit and see the magic. But just a second. You have to pass two parameters (training data & number of epochs) to the function above. Give it those, run the program, relax and see what GANs can do for you.
Results
Three image grids shown below, each containing 16 images, were produced by the Generator at three different stages of the training. You can see that initially, the Generator produces noisy images. But as training progresses, the Generator improves and starts producing more realistic-looking fashion images.
Summary
In a nutshell, we started introducing GANs, why we need them, their advantages, and their intuition. Then we dug deeper and understood the components of GAN, i.e., the Generator and Discriminator. We then discussed in detail the two most important aspects: Training Strategy and the Objective Function of GAN.
Finally, we implemented a GAN in the TensorFlow framework with the Fashion-MNIST dataset and achieved amazing results.
This is all about GANs and generating images with them. The field is advancing at an amazing pace. But I am sure this article has loaded you with so much theoretical and practical knowledge about GANs that you can easily keep up with the advancements and improvements going on.
Remember, you can have a working example of all this code available in Google colab .
Thanks for reading
Source: livecodestream