Link to the paper
Contribution
- This paper introduces a new framework called 3DGAN, which generates 3D objects from a probabilistic space using volumetric convolutions and GANs.
Background
- Generative Adversarial Nets (GAN): Refer here
- Variational Autoencoder: An autoencoder network is actually a pair of two connected networks, an encoder and a decoder. An encoder network takes in an input, and converts it into a smaller, dense representation, which the decoder network can use to convert it back to the original input.
- Volumetric Convolutions: Convolution layers for 3D input.
Description
- 3D object understanding and generation is an important problem in the graphics and vision community.
- With the help of adversarial training, the generator encapsulates the object structure implicitly and then synthesizes high quality 3D objects.
- The generator establishes mapping from low dimensional probability space to a space of 3D objects, so that there’s no need of reference models or CAD models for generating 3D objects.
- This network, when combined with a variational autoencoder can directly reconstruct a 3D object from a 2D image.
- The discriminator provides a powerful 3D shape descriptor which, learned without supervision, has wide applications in 3D object recognition.
Methodology
- 3D-GAN
- In 3D-GAN, a 200 dimensional latent vector z, randomly sampled from a probabilistic latent space, is converted to a 64 x 64 x 64 cube, by the generator G, representing an object G(z) in a 3D voxel space.
- The discriminator D takes in 3D object image x and gives as output a confidence value D(x) of whether the input is real or synthetic.
- Binary cross entropy is used as the loss function.
- The discriminator usually learns faster, and this makes it hard for the generator to improve, as all samples it generates are correctly identified as synthetic with high confidence.
- Therefore, to keep the training of both networks in pace, for each batch, the discriminator is updated only if its accuracy in the previous batch is less than 80%.
- 3D-VAE-GAN
- The 3D-VAE-GAN consists of three components: an image encoder E, a generator G, and a discriminator D.
- The image encoder E takes 2D image as input and outputs the latent representation vector z.
- Further operations are similar to that of 3D-GAN.
- The loss function consists of three parts: an object reconstruction loss Lrecon, a cross entropy loss L3D-GAN, and a KL diversion loss LKL.
Areas of Application
- 3D Object Generation
- 3D Object Classification
- Single Image 3D Reconstruction
Related Papers
Reference