Link to the paper

Contribution

  • This paper introduces a new framework called 3DGAN, which generates 3D objects from a probabilistic space using volumetric convolutions and GANs.

Background

  • Generative Adversarial Nets (GAN): Refer here
  • Variational Autoencoder: An autoencoder network is actually a pair of two connected networks, an encoder and a decoder. An encoder network takes in an input, and converts it into a smaller, dense representation, which the decoder network can use to convert it back to the original input.
  • Volumetric Convolutions: Convolution layers for 3D input.

Description

  • 3D object understanding and generation is an important problem in the graphics and vision community.
  • With the help of adversarial training, the generator encapsulates the object structure implicitly and then synthesizes high quality 3D objects.
  • The generator establishes mapping from low dimensional probability space to a space of 3D objects, so that there’s no need of reference models or CAD models for generating 3D objects.
  • This network, when combined with a variational autoencoder can directly reconstruct a 3D object from a 2D image.
  • The discriminator provides a powerful 3D shape descriptor which, learned without supervision, has wide applications in 3D object recognition.

Methodology

  • 3D-GAN
    • In 3D-GAN, a 200 dimensional latent vector z, randomly sampled from a probabilistic latent space, is converted to a 64 x 64 x 64 cube, by the generator G, representing an object G(z) in a 3D voxel space.
    • The discriminator D takes in 3D object image x and gives as output a confidence value D(x) of whether the input is real or synthetic.
    • Binary cross entropy is used as the loss function.
    • The discriminator usually learns faster, and this makes it hard for the generator to improve, as all samples it generates are correctly identified as synthetic with high confidence.
    • Therefore, to keep the training of both networks in pace, for each batch, the discriminator is updated only if its accuracy in the previous batch is less than 80%.
  • 3D-VAE-GAN
    • The 3D-VAE-GAN consists of three components: an image encoder E, a generator G, and a discriminator D.
    • The image encoder E takes 2D image as input and outputs the latent representation vector z.
    • Further operations are similar to that of 3D-GAN.
    • The loss function consists of three parts: an object reconstruction loss Lrecon, a cross entropy loss L3D-GAN, and a KL diversion loss LKL.

Areas of Application

  • 3D Object Generation
  • 3D Object Classification
  • Single Image 3D Reconstruction

Related Papers

Reference