Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Contribution

This paper introduces a version of GAN called Deep Convolutional Generative Adversarial Networks (DCGAN), which are stable to train in most settings.
This paper also showed that the generators have vector arithmetic properties which allows for easy image manipulation.

Generative Adversarial Nets (GAN): Refer here
Convolutional Neural Network: A convolutional neural network (CNN) is a class of deep, feed-forward artificial neural networks, most commonly applied to analyzing visual imagery.
Batch Normalization: Batch normalization helps stabilize learning by normalizing the inputs to each unit to have zero mean and unit variance.
ReLU Activation: ReLU (Rectified Linear Unit) is an activation function where f(x)=0 for x<0 and f(x)=x for x>=0.
LeakyReLU Activation: LeakyReLU is an activation function where f(x)=αx for x<0 and f(x)=x for x>=0. Here α is called leak and it helps increase the range of ReLU function.
Dropout: Dropout refers to dropping out units from layers in the neural network. This is done for reducing overfitting in neural networks.

Previous attempts to improve the performance of GANs to model images, using CNNs were unsuccessful.
This paper tells us about the various layers and activations used by the authors in their network, for successfully modelling image distributions.
The various settings and hyperparameters used by the authors, and their effects on the result, are also mentioned in the paper.

A uniform noise distribution z is fed into the first layer of generator, which can be a fully connected layer.
This layer is used for matrix multiplication, and the result is reshaped into a 4-dimensional vector.
For the discriminator, the last layer is flattened and then fed into a single sigmoid output.
Pooling layers are replaced with strided convolutions for discriminator, and with fractional-strided convolutionals for generator.
To prevent mode collapse, Batch Normalization has been used.
Batch Normalization is not applied to the output layer of generator and the input layer of discriminator, as it may lead to sample oscillations and also model instability.
The generator uses ReLU activations for all layers except the output, which uses Tanh.
The discriminator uses LeakyReLU activations for all layers.
Dropout was used to decrease the likelihood of memorization.

DCGANs were trained on three datasets, Large Scale Scene Understanding (LSUN), Imagenet-1K, and assembled Faces dataset.
All images were scaled to the range of tanh activation function [-1, 1].
Mini-batch Stochastic Gradient Descent (SGD) was used for training, with minibatch size of 128.
Weights were initialized with zero-centered Normal distribution with standard deviation of 0.02.
The slope of leak was set to 0.2 in LeakyReLU.
Adam Optimizer was used for accelerating the training process.
Learning rate was set to 0.0002 and the momentum term β was set to 0.5 for stabilizing training.

Generation of higher resolution images
Vector arithmetic can be performed on images in Z space to get results like man with glasses - normal man + normal woman = woman with glasses.
The use of vector arithmetic could decrease the amount of data needed for modelling complex image distributions.