The comparison of Discriminative Model and Generative Model.
The Discriminative Model tries to draw a line in the data space to distinguish the data.
In contrast, the Generative Model tries to produce data which fall close to their real counterparts in the data space.
The Generative Adversarial Networks (GANs) consists of two parts; Generator and Discriminator.
The generator tries to create samples that are intended to come from the same distribution as the training data, while the discriminator tries to examine samples to determine whethere they are real (coming from the sample dataset) or fake (coming from the generator) as shown above.
Since those two networks pursue their goals without controlling other actions interactively(decieving the discriminator / detecting the output created by the generator), this framework can be interpreted as Game Theory: two-player minimax Game and the solution will land to a Nash Equilibria.
The Minimax Game is a decision rule for minimizing the possible loss for a worst case scenario.
The solution of the game is to maximize the minimum gain, which is referred to as maximin value.
The maximin value is the highest value that the player can be sure to get without knowing the actions of the other players defined as below.
The solution to a game is a Nash Equilibria which is a tuple () that is a local minimum of with respect to and a local minimum of with respect to .
The overview of GANs architecture consists of two networks, generator and discriminator , each of which is differentiable both with respect to its inputs, for and for , and with respect to its parameters, for and for
The generator is simply a differentiable function with input , the prior probability.
When is sampled from some simple prior distribution, yeilds a sample of drawm from .
The inputs to the function do not need to correspond to inputs to the first layer of the deep neural net; inputs may be provided at any point throughout the network.
If we want to have full support on space we need the dimension of to be at least as large as the dimension of , and must be differentiable, but those are the only requirements.
The discriminator is simply a differentiable classifier function .
The purpose of the discriminator is to distinguish the real data from the data created by the generator ( and ).
The inputs to the function come from two different sources; one is from examples of randomly sampled from the training set and the other is from the data created by generator.
To build a Cost Functions for GANs, it is essential to know maximum likelihood estimation and minimax problem.
The basic idea of maximum likelihood is to define a model that provides an estimate of a probability distribution, parameterized by parameters .
The likelihood as the probability that the model assigns to the training data is refered as below.
, for a dataset containing training examples
The principle of maximum likelihood simply says to choose the parameters for the model that maximize the likelihood of the training data.
The maximum likelihood estimation would be interpreted as minimizing the Kullback-Leibler Divergence(KL divergence) between the data generating distribution and the model as below.
In mathematical statistics, the Kullback–Leibler divergence, (also called relative entropy), is a measure of how one probability distribution is different from a second, reference probability distribution.
Consider two probability distributions and where, usually, represents the data, the observations, or a measured probability distribution and distribution represents instead a theory, a model, a description or an approximation of .
The Kullback–Leibler divergence is then interpreted as the average difference of the number of bits required for encoding samples of using a code optimized for rather than one optimized for .
For discrete probability distributions and defined on the same probability space, , the relative entropy or KL divergence from to is defined as below.
where, is discrete
where, is continuous.
In other words, it is the expectation of the logarithmic difference between the probabilities and , where the expectation is taken using the probabilities .
In probability theory and statistics, the Jensen–Shannon divergence is a method of measuring the similarity between two probability distributions.
It is based on the Kullback–Leibler divergence, with some notable (and useful) differences, including that it is symmetric and it always has a finite value.
Since GANs consists of two player, generator and discriminator which try to pursue their goals without knowing each other's actions, The objective function can be set up based on minimax game as below.
First, consider the optimal discriminator, for any given generator, .
Since the training objective for can be interpreted as maximizing the log-likelihood for estimating the conditional probability , where Y indicates whether comes from (with ) or from (with ), the cost function, reformulated as below. And, with the cost function, , the global minimum is achieved when
Detailed infromation for the proof of Theorem 1
The value function with the optimal disciminator can be interpreted as cost function for discriminator with fixed .
Thus, is achieved the global minimum at with the value
The overview of the Algorithm1 is given below.