Link to the paper
Contribution
- This paper introduces a version of GAN called Deep Convolutional Generative Adversarial Networks (DCGAN), which are stable to train in most settings.
- This paper also showed that the generators have vector arithmetic properties which allows for easy image manipulation.
Background
- Generative Adversarial Nets (GAN): Refer here
- Convolutional Neural Network: A convolutional neural network (CNN) is a class of deep, feed-forward artificial neural networks, most commonly applied to analyzing visual imagery.
- Batch Normalization: Batch normalization helps stabilize learning by normalizing the inputs to each unit to have zero mean and unit variance.
- ReLU Activation: ReLU (Rectified Linear Unit) is an activation function where f(x)=0 for x<0 and f(x)=x for x>=0.
- LeakyReLU Activation: LeakyReLU is an activation function where f(x)=αx for x<0 and f(x)=x for x>=0. Here α is called leak and it helps increase the range of ReLU function.
- Dropout: Dropout refers to dropping out units from layers in the neural network. This is done for reducing overfitting in neural networks.
Description
- Previous attempts to improve the performance of GANs to model images, using CNNs were unsuccessful.
- This paper tells us about the various layers and activations used by the authors in their network, for successfully modelling image distributions.
- The various settings and hyperparameters used by the authors, and their effects on the result, are also mentioned in the paper.
Methodology
- A uniform noise distribution z is fed into the first layer of generator, which can be a fully connected layer.
- This layer is used for matrix multiplication, and the result is reshaped into a 4-dimensional vector.
- For the discriminator, the last layer is flattened and then fed into a single sigmoid output.
- Pooling layers are replaced with strided convolutions for discriminator, and with fractional-strided convolutionals for generator.
- To prevent mode collapse, Batch Normalization has been used.
- Batch Normalization is not applied to the output layer of generator and the input layer of discriminator, as it may lead to sample oscillations and also model instability.
- The generator uses ReLU activations for all layers except the output, which uses Tanh.
- The discriminator uses LeakyReLU activations for all layers.
- Dropout was used to decrease the likelihood of memorization.
Experiments
- DCGANs were trained on three datasets, Large Scale Scene Understanding (LSUN), Imagenet-1K, and assembled Faces dataset.
- All images were scaled to the range of tanh activation function [-1, 1].
- Mini-batch Stochastic Gradient Descent (SGD) was used for training, with minibatch size of 128.
- Weights were initialized with zero-centered Normal distribution with standard deviation of 0.02.
- The slope of leak was set to 0.2 in LeakyReLU.
- Adam Optimizer was used for accelerating the training process.
- Learning rate was set to 0.0002 and the momentum term β was set to 0.5 for stabilizing training.
Areas of Application
- Generation of higher resolution images
- Vector arithmetic can be performed on images in Z space to get results like man with glasses - normal man + normal woman = woman with glasses.
- The use of vector arithmetic could decrease the amount of data needed for modelling complex image distributions.
Related Papers