Selection of GAN vs Adversarial Autoencoder models


Introduction

For the past few years, generative models have attracted a lot of attention in the deep learning community. Among these, Adversarial Autoencoders (AAEs) and Generative Adversarial Networks (GANs) are two of the most well-liked models for producing realistic images. While AAEs are more adapted to producing various images that accurately capture the core of the training data, GANs are better suited to producing high-quality images that closely resemble the training data. We will talk about choosing GAN and AAE models for problems involving image generation in this article.

Generative Adversarial Network (GAN)

Ian Goodfellow introduced generative adversarial networks (GANs) in 2014. A generator network and a discriminator network are the two neural networks that make up GANs. A random noise vector is fed into the generator network, which outputs an image. The discriminator network attempts to differentiate between the generated image and a genuine image as input. While the discriminator network aims to accurately differentiate between actual and created images, the generator network aims to produce images that are indistinguishable from genuine photos.

On the other hand, Alireza Makhzani et al. presented Adversarial Autoencoders (AAEs) in 2015. AAEs are GAN and autoencoder models combined. AAEs have a generator network and a discriminator network, just like GANs do. An autoencoder, which consists of an encoder network and a decoder network, serves as the generation network in AAEs, though. The decoder network converts a latent vector back to an image after the encoder network maps an input image to a low-dimensional latent space. In AAEs, the discriminator network seeks to differentiate between latent vectors generated by an encoder network and those generated by a previous distribution.

One of the most well-known deep generative modelling techniques at the moment is the Generative Adversarial Network, or GAN. The main difference between GAN and VAE is that GAN optimises the model distribution to the real distribution in a different way and aims to match the pixel level distribution rather than the data distribution.

Training a GAN network

How can GAN optimise the network to learn the output distribution, despite the fact that we are aware that GAN modifies the input to follow the object distribution? Both "direct" and "indirect" techniques can be used to accomplish this. The "direct" technique compares the actual distribution with the produced distribution, identifies the flaws, and then modifies the networks appropriately. This technique was applied by the Generative Matching Network (GMN). On the other hand, the actual distribution of the products is probably going to be difficult. Contrary to a Gaussian distribution, which can be explained by the mean and variance alone. It would be difficult to express the real and created distributions explicitly. Instead, samples from the real and fabricated distributions are used to compare the distributions. Using samples of real and made-up data, we may study the difference and estimate the distribution.

Adversarial Networks

Using the autoencoder architecture and the adversarial loss idea from GAN, the Adversarial Autoencoder (AAE) is a wonderful idea. Similar to the Variational Autoencoder (VAE), it regularizes the latent code using adversarial loss rather than KL-divergence.

VAE employs KL-divergence (the difference between distributions) (or any other arbitrary distribution that was chosen) to fit the encoded latent coding into a normal distribution. AAE replaces this with adversarial loss, which increases the number of discriminators and turns the encoder into the generator. AAE's generator develops a latent code and tries to persuade the discriminator that the latent code is a sample from the predefined distribution, in contrast to GAN, where the generator's output is the constructed picture and the discriminator's input contains both true and false images. On the other hand, the discriminator will decide if a specific latent code was produced by the autoencoder (fake) or a random vector taken from the normal distribution (genuine).

Three different types of encoders to choose from−

  • The encoder, which is the same encoder as the autoencoder, will try to compress the input into the desired characteristics stated as a vector z.

  • The Gaussian Posterior encoder will record the gaussian distribution of each feature using two variables, mean and variance, as opposed to encoding each feature as a single value.

  • The Universal Approximator Posterior is also used to encode the characteristics as a distribution. However, we do not assume that the feature distribution is Gaussian in nature. In this example, the encoder will be a function called f(x, n), where x is the input and n is random noise with any distribution that is conceivable.

As a result, the following components make up the AAE architecture −

  • The input is transformed by the encoder into a lower-dimensional format (latent code z).

  • The latent code z will be converted into the final image by the decoder.

  • The autoencoder's encoded latent code z (fake) and a randomly chosen vector z from the predetermined distribution (real) are both used by the discriminator. It will check to see if the input is authentic.

Let's now discuss some elements that can guide our decision about the use of GANs or AAEs for image generating task −

  • Dataset − The model choice is influenced by the type of dataset. In cases where the dataset is large and diverse, GANs may be the best choice. High-quality images made with GANs can capture the diversity of the dataset. AAEs, on the other hand, might be a better choice if the dataset is modest and has fewer diversifications. From a little dataset, AAEs are able to provide a more varied collection of photos.

  • Quantity vs. Quality − The model to use depends on the quantity and quality of the photos that are generated. GANs may be a superior option if producing high-quality photos is the main objective. GANs are capable of creating realistic visuals that are clear and appealing to the eye. However, if the objective is to produce a substantial image, then AAE is best.

  • Competence − The user's competence affects the model that is selected. Finding the appropriate hyperparameters for GANs can be tricky because GANs are famously challenging to train. AAEs, in contrast, are simpler to train and more stable than GANs. AAEs may therefore be a better option if the user lacks deep learning expertise.

Conclusion

In conclusion, The decision among GANs and AAEs for picture age errands relies upon factors like the dataset, quality versus amount, application, computational assets, and client mastery. AAEs succeed at catching the substance of preparing information, delivering assorted pictures, while GANs are better at creating excellent pictures looking like the preparation information. AAEs are suitable for environments with limited resources because they are computationally lighter. Nevertheless, despite their higher computational requirements, GANs are the preferred option if image quality is of the utmost importance. The user's skill level and the requirements of the task ultimately determine the choice.

Updated on: 13-Jul-2023

135 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements