当前位置：网站首页>Gan: generative advantageous nets -- paper analysis and the mathematical concepts behind it

Gan: generative advantageous nets -- paper analysis and the mathematical concepts behind it

2022-07-28 04:56:00 【gongyuandaye】

One 、 Purpose

Input ： Random noise
Output ： Sample the image of self training sample distribution

Two 、 Model

Generate models G： Capture data distribution , Expect to produce as real a picture as possible , Cheated D.
Discriminant model D： Evaluate a sample from a training set rather than G The possibility of , Expect to accurately distinguish true and false pictures .
Insert picture description here

3、 ... and 、 Training

Two person system minimax game ：
Insert picture description here
Algorithm is as follows ：

among ：
1、 Complete in the inner loop D The optimization of is complex , And it will be fitted on a limited data set , So finish first k round D The optimization of the （ The gradient rises ）, Finish another round G The optimization of the （ gradient descent , There are changes in actual use ）. The result shows that as long as G Change is slow enough ,D It can be maintained near the optimal solution .
2、min The process （ gradient descent ） in , When the generated sample is very bad , The output value of the discriminator will be very small , The gradient of generator loss function here is very small , It makes the generator learn very slowly . contrary , When the generated sample is good , The output value of the discriminator will be large , The generator loss function has a large gradient here , Generator update is large .
Insert picture description here
So for min Make the following transformation ：

3、 When the distribution of training samples and data does not coincide , It is impossible to update the gradient , So there is LSGAN、WGAN etc. .

Four 、 Mathematical concepts

4.1 Maximum likelihood estimation

$P$ _{$d a t a$} $(x)$ ：G The data distribution of the image you want to find .
$P$ _$G$ $(x; θ)$ ： from θ Determined distribution .
The goal is ： determine θ Make the above two distributions similar , hypothesis $P$ _$G$ $(x; θ)$ Gaussian mixture model ,θ The matrix and covariance matrix of Gaussian distribution .

step ：
from $P$ _{$d a t a$} $(x)$ Sample {𝑥1, 𝑥2, … , 𝑥𝑚}, Calculation $P$ _$G$ $(x; θ)$ The likelihood of producing these samples ：
Insert picture description here
By maximum likelihood estimation , We can find θ*：

（ The subtraction of the penultimate step and θ irrelevant , from KL The last step of divergence definition is min）
Maximize likelihood ＝ To minimize the KL The divergence （ A measure of the similarity between two probability distributions , Asymmetric ）

4.2 Definition of generator

utilize neural network To build the generator G, The network output defines a density distribution 𝑃_𝐺.
Insert picture description here
$P$ _{$d a t a$} Unknown , Only its sampling data can be obtained . Sample from a prior probability distribution z As input . To make the two distributions close, we need to calculate the divergence of the two distributions , Need a discriminator .

4.3 Definition of discriminator

Insert picture description here
because G Fixed , Therefore, the training should be maximized V(G, D) Of The optimal D*.
The training process has the following characteristics ：

Maximize V(G, D) The formula of is derived as follows ：

So equation （4） Can have ：

Thus there are ：
Insert picture description here
among ：
1、 The sum of probability distributions is 1.
2、JSD Express JS The divergence （ symmetry ）, It is KL A distortion of divergence , It also indicates the difference between the two distributions ：

Go back to the original formula （1）, Now come back to beg The optimal G, After the above derivation, there are ：
Insert picture description here
JS by 0, Indicates that the two distributions are exactly the same , The upper bound is log2, That is, when the two distributions do not coincide , here GAN Obviously, I can't train .

4.4 The calculation process

To minimize JSD When , Think of it as a loss function L(G), Through gradient descent ：
Insert picture description here
To maximize V(G, D) when , Using the gradient rise method .
So how to calculate in a given G In the case of V(G, D) To maximize the D* Well ？

Maximize the following formula ：

namely ：

The complete calculation process is as follows ：