Link to the paper

Contribution

  • This paper introduces a version of GAN called Deep Convolutional Generative Adversarial Networks (DCGAN), which are stable to train in most settings.
  • This paper also showed that the generators have vector arithmetic properties which allows for easy image manipulation.

Background

  • Generative Adversarial Nets (GAN): Refer here
  • Convolutional Neural Network: A convolutional neural network (CNN) is a class of deep, feed-forward artificial neural networks, most commonly applied to analyzing visual imagery.
  • Batch Normalization: Batch normalization helps stabilize learning by normalizing the inputs to each unit to have zero mean and unit variance.
  • ReLU Activation: ReLU (Rectified Linear Unit) is an activation function where f(x)=0 for x<0 and f(x)=x for x>=0.
  • LeakyReLU Activation: LeakyReLU is an activation function where f(x)=αx for x<0 and f(x)=x for x>=0. Here α is called leak and it helps increase the range of ReLU function.
  • Dropout: Dropout refers to dropping out units from layers in the neural network. This is done for reducing overfitting in neural networks.

Description

  • Previous attempts to improve the performance of GANs to model images, using CNNs were unsuccessful.
  • This paper tells us about the various layers and activations used by the authors in their network, for successfully modelling image distributions.
  • The various settings and hyperparameters used by the authors, and their effects on the result, are also mentioned in the paper.

Methodology

  • A uniform noise distribution z is fed into the first layer of generator, which can be a fully connected layer.
  • This layer is used for matrix multiplication, and the result is reshaped into a 4-dimensional vector.
  • For the discriminator, the last layer is flattened and then fed into a single sigmoid output.
  • Pooling layers are replaced with strided convolutions for discriminator, and with fractional-strided convolutionals for generator.
  • To prevent mode collapse, Batch Normalization has been used.
  • Batch Normalization is not applied to the output layer of generator and the input layer of discriminator, as it may lead to sample oscillations and also model instability.
  • The generator uses ReLU activations for all layers except the output, which uses Tanh.
  • The discriminator uses LeakyReLU activations for all layers.
  • Dropout was used to decrease the likelihood of memorization.

Experiments

  • DCGANs were trained on three datasets, Large Scale Scene Understanding (LSUN), Imagenet-1K, and assembled Faces dataset.
  • All images were scaled to the range of tanh activation function [-1, 1].
  • Mini-batch Stochastic Gradient Descent (SGD) was used for training, with minibatch size of 128.
  • Weights were initialized with zero-centered Normal distribution with standard deviation of 0.02.
  • The slope of leak was set to 0.2 in LeakyReLU.
  • Adam Optimizer was used for accelerating the training process.
  • Learning rate was set to 0.0002 and the momentum term β was set to 0.5 for stabilizing training.

Areas of Application

  • Generation of higher resolution images
  • Vector arithmetic can be performed on images in Z space to get results like man with glasses - normal man + normal woman = woman with glasses.
  • The use of vector arithmetic could decrease the amount of data needed for modelling complex image distributions.

Related Papers