Generative Adversarial Networks are all the rage, both for their practical applications and creative output. In this blog post, I'll go over what GANs are, how they work, and a few flavors of GANs that have lead to the current state of the art research in Generative networks.

The goal of Generative Adversarial Network are to produce synthetic  images that look like images from the training set. That is if you feed in images of dogs into a GANs, it should produce an unlimited variety of dog like images. GANs accomplish this by attempting to learn the probability distribution of the training set.

The Network

DCGAN Network

Learning the probability distribution of the generated data distribution from the input data is  done by setting up two distinct neural network that work learn in tandem in a minimax game. The generator network (G) is trying to generate images that look like they are from the training set, while the discriminator (D) network is attempting to determine whether a given image was produced by the generator, or from the training set. The discriminator will act as a traditional binary classifier, determining whether the output of the generator is real or fake and generating a 1 (real) or 0 (fake) and a softmax probability. This game plays out over a series of epochs. At fist, neither the generator or discriminator perform well. As learning progresses and the gradients are passed back to each network, both improve their performance.

The DCGANs Generator

The generator networks job is to produce images that look like they could be from the training set. The generator samples from a random z vector, and through a series of convolutional layers, upsamples that noise into a 64x64x3 image where 64 are the height and width of the pixels and 3 is the RBG color value for each pixel. This upsampling is done through a series of fractional strided (transposed) convolution layers.

Fractional Strided Convolutions (Generator)

The DCGANs Discriminator

The discriminator networks job is to determine whether the image is a generated 'fake' image or a real image. The discriminator samples receives an image, and through a series of convolutional layers, downsamples the image into a fractional probability output where 0 is a fake image and 1 is a real image.  This downsampling is done through a series of strided convolution layers. The discriminator network works like a typical CNN with in any binary classification situation. Here, a 5x5 is fed into a convolutional layer with a stride is 2, a kernel of 3x3, the padding is deactivated. This results in a 2x2 image.

Strided Convolutions (Discriminator)

The probability distributions for the fake image start to take shape in relation to the distribution of the real images over a series of epochs. The loss is calculated between the distributions and the gradients are passed back to the generator and discriminator. We measure the difference between the distributions with KL divergence or JS divergence for DCGANs. In relation to Vanilla GANs, DCGANs makes a couple of additional changes to the network architecture. Namely they 1) use batchnorm in both the generator and the discriminator,  2) Remove fully connected hidden layers for deeper architectures 3) Use ReLU activation in generator for all layers except for the output, which uses Tanh 4) Use LeakyReLU activation in the discriminator for all layers. For Wasserstein GANs (WGAN), we use Wasserstien distance, or Earth Movers Distance (EMD).

Generative Output

As we can see from the gif below, the generated images will typically start out as white noise, as neither the generator or discriminator know anything about the training set. As the loss for both the generator and discriminator drop, the generated images start to take the shape of their intended targets. Here we see the output of DCGANs over a series of epochs using the MNIST dataset of hand written digits.

DCGAN output of MNIST digits over the first few epochs 

The Proliferation of GANs

In 2019 the MNIST results aren't remotely impressive. Over the past 4 year, GANs have developed, adopting strategies to produce highly realistic images that are difficult to tell apart from real images. While DCGANs was a breakthrough in term of image quality, more advanced network like NVIDIA's Progressive Growing of GANs and Googles BigGANS have produced hyper realistic images.

Results from NVIDIAs Progressive Growing GANs

Applications

There are plenty of practical applications for GANs including data augmentations, simulations for reinforcement learning, super resolution, privacy preservation, anomaly detection, discriminative modeling, domain adaptation and adversarial training.