当前位置:网站首页>Gan's training skills: alchemist cultivation plan - generative confrontation network training, participation and improvement
Gan's training skills: alchemist cultivation plan - generative confrontation network training, participation and improvement
2022-07-27 02:08:00 【Medium coke with ice】
Catalog
- One 、 Mode crash : The result pattern produced by the generator is relatively simple
- Two 、 Slow training : The gradient disappears
- 3、 ... and 、 Nonconvergence : Unstable training , Slow convergence
- Four 、 Over fitting
- 5、 ... and 、 Find failure as soon as possible
- 6、 ... and 、 Some training skills
- Last
Generative antagonistic network (GAN:Generative adversarial networks) It is an important generative model in the field of deep learning , Two networks ( Generators and discriminators ) Train at the same time and minimize the maximum algorithm (minimax) Compete in . This confrontation method avoids some difficulties in the practical application of some traditional generation models , Skillfully approximate some insoluble loss functions through confrontation learning .
We introduced GAN Principle : Explain profound theories in simple language understand GAN The principles of mathematics in ,GAN The most important thing is to find D And G Nash equilibrium between , But in practice, we will find GAN Unstable training , Poor training methods are prone to problems such as mode collapse , This article will record some training skills , Not necessarily suitable for your model , There may also be omissions and mistakes , For learning reference , Welcome to correct and add .
One 、 Mode crash : The result pattern produced by the generator is relatively simple
In a narrow sense, the pattern collapse phenomenon is that the generator only produces a single or limited pattern to deceive the discriminator , Just to get the lowest discriminator loss D_loss, But it ignores the distribution of data sets , For example, an animal image data set ,GAN During training, I found that the effect of producing cats and dogs is very good , Generate cattle 、 sheep 、 The effect of monkeys is very poor , Whole G Just make cats and dogs , Don't learn to generate other animal images at all , It will result in a single image . Mode collapse is essentially GAN Training optimization problem , Even the best GAN Researchers are also struggling with pattern collapse .
There are many ways to solve the mode crash , as follows :
1.1、 Improve training methods
- Small batch discriminator (mini-batch discriminator): Because the discriminator can only process one sample at a time , The gradient information obtained by the generator on each sample is lacking “ Unified coordination ”, All point in the same direction . therefore Small batch makes the discriminator no longer consider a sample independently , Instead, consider all samples in a small batch at the same time , The specific implementation can be seen in : How does the small batch discriminator solve the problem of mode crash .
- Experience replay : Show the old false samples to the discriminator at regular intervals , It can minimize the jumping between modes . This prevents the discriminator from becoming too easy to use , But only the patterns that the generator has explored in the past .
- adjustment GAN Learning speed of ( Learning rate ): Overcome this obstacle by changing this specific super parameter , Use a smaller learning rate , And training from scratch , Learning speed is one of the most important super parameters , Even if it's not the most important super parameter , Even small changes in it may lead to fundamental changes in the training process .
- Feature matching : Feature matching changes the generator cost function, To minimize the statistical difference between the features of the real image and the generated image , Measure the difference between the mean values of their eigenvectors L2 distance .
- Pack multiple samples belonging to the same category , Then it is passed to the discrimination network D.
- Expect a counterattack : When the generator is updated , Consider not only the current state of the generator , Additional consideration will be given K The state of the discriminator after the second update , Combine the two information to make the optimal solution , That is, the parameter is updated continuously by gradient descent K Time , Improve the performance of the generator “ prophetic vision ”, Thus avoiding short-sighted behavior . First, change the parameter update method to adopt gradient descent method for continuous update K Time , as follows :
θ D 0 = θ D … ⋯ θ D K = θ D K − 1 + η ∂ f ( θ G , θ D K − 1 ) ∂ θ D K − 1 \begin{aligned} \theta_{D}^{0} &=\theta_{D} \\ & \ldots \cdots \\ \theta_{D}^{K} &=\theta_{D}^{K-1}+\eta \frac{\partial f\left(\theta_{G}, \theta_{D}^{K-1}\right)}{\partial \theta_{D}^{K-1}} \end{aligned} θD0θDK=θD…⋯=θDK−1+η∂θDK−1∂f(θG,θDK−1)
The optimization goal of the generator is changed to : θ G = arg min θ G f ( θ G , θ D K ( θ G , θ D ) ) \theta_{G}=\arg \min _{\theta_{G}} f\left(\theta_{G}, \theta_{D}^{K}\left(\theta_{G}, \theta_{D}\right)\right) θG=argminθGf(θG,θDK(θG,θD)), Change the gradient to : d f K ( θ G , θ D ) d θ G = ∂ f ( θ G , θ D K ( θ G , θ D ) ) ∂ θ G + ∂ f ( θ G , θ D K ( θ G , θ D ) ) ∂ θ D K ( θ G , θ D ) ∂ θ D K ( θ G , θ D ) ∂ θ G \frac{d f_{K}\left(\theta_{G}, \theta_{D}\right)}{d \theta_{G}}=\frac{\partial f\left(\theta_{G}, \theta_{D}^{K}\left(\theta_{G}, \theta_{D}\right)\right)}{\partial \theta_{G}}+\frac{\partial f\left(\theta_{G}, \theta_{D}^{K}\left(\theta_{G}, \theta_{D}\right)\right)}{\partial \theta_{D}^{K}\left(\theta_{G}, \theta_{D}\right)} \frac{\partial \theta_{D}^{K}\left(\theta_{G}, \theta_{D}\right)}{\partial \theta_{G}} dθGdfK(θG,θD)=∂θG∂f(θG,θDK(θG,θD))+∂θDK(θG,θD)∂f(θG,θDK(θG,θD))∂θG∂θDK(θG,θD)
1.2、 Improve the objective function
- Feature matching : Change the loss function of the generator ;
- use Wassernstein Distance instead of JS The divergence ;
- Add a penalty term to the gradient :WGAN-GP、DRAGAN;
- introduce pixel Level loss, Especially in the early stage of training , Such as L1, L2 etc. ;
- Add a regular term to the loss function , help GAN Find more diversity ;
- Use mean square loss ( mean squared loss ) Replace logarithmic loss ( log loss ).
1.3、 Improve network architecture
- Use multiple generators , Simply accept GAN Only a subset of the schema in the dataset is covered , And train multiple generators for different modes , Instead of the collapse of the confrontation mode , Generate images together , In this way, a variety of images can be generated ;
- Self attention mechanism : Global information ( Long distance dependence ) Will be used to generate better images .
Two 、 Slow training : The gradient disappears
- The network uses residual structure : Adaptive network depth , At the same time, avoid the gradient disappearing ;
- softmax+CrossEntropy loss: The loss function is used to offset the gradient disappearance effect caused by the derivation of the activation function
- Use Adam Optimizer ;
- Don't train the discriminator too well , In order to avoid the failure of training due to the disappearance of gradient in the later stage , The task of the discriminator is to assist in learning a certain distance between the essential probability distribution of the data set and the implicit probability distribution defined by the generator , The task of the generator is to minimize this distance ;
- For models with too many layers , Try to avoid using full connection layer .
3、 ... and 、 Nonconvergence : Unstable training , Slow convergence
- When the loss of generator or discriminator suddenly increases or decreases , Don't stop training at will , The loss function tends to rise or fall randomly , There is nothing wrong with this phenomenon , When encountering sudden instability , Do more training , Focus on the quality of the generated image , Visual understanding is usually more meaningful than some lost numbers ;
- Add noise : Adding noise is conducive to improving the overall diversity and stability of the system , In real data and synthetic data ( For example, the image generated by the generator ) Add noise to ; In the field of Mathematics , This should be effective , Because it helps to provide a certain stability for the data distribution of two competing networks ;
- Soft tags or tags with noise : If the label of the real image is set to 1, We change it to a lower value , such as 0.9. This solution prevents the discriminator from being too sure of its classification label , Or to put it another way , It does not rely on a very limited set of features to determine whether the image is true or false .
Four 、 Over fitting
stay GAN in , If the discriminator relies on a small set of features to detect real images , Then the generator can generate only these features to utilize only the discriminator . Optimization may become too greedy and will not produce long-term benefits ;
- Use regularization to avoid over fitting , Commonly used L1、L2 Two algorithms , If already used , Adjust its parameter size ;
- dropout: Let some neurons stop working with a certain probability . Select a subset randomly from hidden layer neurons and delete it temporarily , Then the part of parameters that are not deleted during training is updated , The deleted neuron parameter keeps the result before being deleted , Repeat the process over and over again ;
- Soft tags or tags with noise ( Ibid. III ).
5、 ... and 、 Find failure as soon as possible
- D Of loss Always close to 0, Directly declare failure . The discriminator is too strong , The generator can no longer produce better false data , It can also be considered that the gradient has disappeared , This situation is very common because it is usually easier to identify true and false samples than to forge real samples ;
- D Of loss stay in a high position without going down , The generated image is very blurred , Most likely, it has failed . Poor network judgment ability , Randomly distinguish the true from the false , Even mistake the true for the false , False is mistaken for true , The generator cannot get from the discriminator D I learned something there ;
- Observe the image and find that the generated image is single , A mode crash occurred , The generating network happens to be particularly handy in generating a certain kind of real samples , perhaps , The discrimination network has relatively poor discrimination ability for certain types of samples , Then the generated network will make full use of its advantages and circumvent its disadvantages , Generate as many samples as possible ;
- In a certain way epoch After observing the image, it is found that the generated image is blurred , It's all noise , Most likely, it has failed , Gradient updates have begun to be meaningless , Further training will not improve , So don't waste your time on nothing , Ill conditioned gradient update ;
- GAN in loss It reflects the discrimination ability of the discriminator , The overall change should be down and up 、 To ascend , Finally, it tends to be stable . The decrease is due to the improved performance of the discriminator , The reason for the rise is that the generator generation ability has improved .
6、 ... and 、 Some training skills
- Scale the image pixel value to -1 To 1 Between ,tanh As the output layer of the generator ;
- Use Adam Optimizer Usually better than others ;
- Use PixelShuffle And transpose convolution Sample up ;
- Use Batch Normalization, It can improve the network generalization ability , Use BN You can also ignore the in fitting drop out and L2 Regularization parameter selection ;
- Before inputting the image into the discriminator , take noise Add to the actual image and the generated image ;
- Use noise as much as possible Normal distribution Instead of evenly distributed ;
- Gradient penalty ;
- Activate function using LeakyRelu;
- Two Timescale Update Rule (TTUR): Different learning rates , Low speed update rules are used to generate networks G , Discrimination network D Use High speed update rules , Select the learning rate of the discriminator as 0.0004, Select the learning rate of the generator as 0.0001 Maybe it can achieve good results ;
- Reverse label , Deliberately confuse black and white on some samples , This let go kid may stimulate GAN Don't go all the way to the dark ;
- In some cases, disrupt the data set , Otherwise, it will lead to prejudice in the learning process of the network ;
- priority : Adjustable parameter > Replace the loss function > Adjust the network structure ;
- Don't use the early stop method , Believe in miracles , Unless the discriminator loss quickly approaches 0;
- Don't give up , Some minor changes will determine your GAN Whether the model can be successfully trained .
Part of the reference is from :
https://arxiv.org/pdf/1606.03498.pdf
https://towardsdatascience.com/gan-ways-to-improve-gan-performance-acf37f9f59b
https://www.zhihu.com/people/xiaomizhou94/posts
Last
Personal profile : Graduate students in the field of artificial intelligence , At present, I mainly focus on text generation and image generation (text to image) Direction
Personal home page : Medium coke with more ice
Time limited free subscription : Text generated images T2I special column
Stand by me : give the thumbs-up + Collection ️+ Leaving a message.
If this article helps you a lot , I hope you can click below to reward me with a coke ! Add more ice
边栏推荐
- MySQL installation
- Autojs learning - realize the display of date and lunar time
- MySQL index
- [reprint] GPU compute capability table
- Unity Huatuo hot update environment installation and sample project
- MySQL备份恢复
- [详解C语言]一文带你认识C语言,让你醍醐灌顶
- [translation] explicit and implicit batch in tensorrt
- 机器学习概述
- (hdu1588) Gauss Fibonacci (sum of matrix fast power + bisection matrix proportional sequence)
猜你喜欢
随机推荐
JS——初识JS、变量的命名规则,数据类型
21dns domain name resolution
js求最大值?
[reprint] GPU compute capability table
2022年T2I文本生成图像 中文期刊论文速览-1(ECAGAN:基于通道注意力机制的文本生成图像方法+CAE-GAN:基于Transformer交叉注意力的文本生成图像技术)
25pxe efficient batch network installation
引用的通俗讲解
MySQL multi table query
Unity Huatuo example project source code analysis and inspiration
科学计算库 —— Matplotlib
22FTP
Beyond hidden display ellipsis
23nfs shared storage service
a元素的伪类
Initial experience of cloud database management
Autojs learning - realize image cutting
Desktop solution summary
ACM模式输入输出练习
三种能有效融合文本和图像信息的方法——特征拼接、跨模态注意、条件批量归一化
7.8 锐捷网络笔试



![[FPGA tutorial case 28] one of DDS direct digital frequency synthesizers based on FPGA -- principle introduction](/img/bf/ce4bc33d2a0fc7fe57105e20fbafcf.png)





