当前位置:网站首页>stylegan2:analyzing and improving the image quality of stylegan

stylegan2:analyzing and improving the image quality of stylegan

2022-06-23 16:16:00 Kun Li

StyleGAN and StyleGAN2 Deep understanding of - You know StyleGAN The paper :A Style-Based Generator Architecture for Generative Adversarial Networks Source code :https://github.com/NVlabs/stylegan effect : Face generation effect The generated fake car effect : The resulting fake bedroom effect : Effect video ( Suggest …https://zhuanlan.zhihu.com/p/263554045【 Thesis translation 】StyleGAN2_bupt_gwy The blog of -CSDN Blog _stylegan2 The paper The translation is for reference only ! The original is pdf, If you want to download, you can stamp :http://www.gwylab.com/pdf/stylegan2_chs.pdfhttps://blog.csdn.net/a312863063/article/details/103550022stylegan2 The core is to solve stylegan1 Drop artifact problem left behind ,stylegan1 There are two core points ,latent w And hierarchical input ,stylegan2 The core is to solve stylegan1 Artifact phenomenon in , Further understand the generator architecture and redesign , It's really more elegant .

abstract:

         Explore and analyze the causes of some feature artifacts , The normalization method of generator is redesigned , Reexamine progressive growing, The generator is regularized , From latent code Better quality is encouraged in mapping to images .

1.introduction

        StyleGAN Its remarkable feature is its unconventional generator architecture . Map the network f No more potential codes entered z Feed only to the beginning of the network , Instead, it is translated into intermediate potential coding w. Then the affine transformation generates the style of the control layer , And normalized by adaptive examples (AdaIN) Participate in synthetic networks g To synthesize . in addition , Random variation is facilitated by providing additional random noise maps to the composite network . That's right stylegan An overview of , Basically up to standard .

         Many observers have noticed that StyleGAN [3] The feature artifact in the generated image . We identified two causes of these artifacts , The architecture of eliminating these artifacts and the improved training method are described . First , We studied the origin of the common speckle artifacts , And found that the generator created them to avoid design defects in its architecture . In the 2 In the festival , We redesigned the normalization used in the generator , This normalization eliminates artifacts . secondly , We analyze the artifacts associated with incremental growth [23], The artifact is stable at high resolution GAN Very successful in training . We propose an alternative design that achieves the same goal - That is, training starts from focusing on low resolution images , Then gradually shift the focus to higher and higher resolution - Without changing the network topology during training . This new design also enables us to reasonably understand the effective resolution of the generated image , This resolution is actually lower than expected , This leads to the increase of network capacity ( The first 4 section ). This is the core of this article , solve stylegan1 The artifact problem left behind .

        Quantitative analysis of the image quality of the generated network is a challenging direction ,Frechet Initial distance (FID) Measured inceptionv3 The density difference between the two distributions in the high-dimensional feature space of the classifier . But recently it has been proved that FID Based on classifier network , Change the network to focus on texture rather than shape , These metrics can not accurately capture the image quality ,PPL Metric is a method to estimate the quality of potential spatial interpolation .

2.removing normalization artifacts

        We first observed that StyleGAN Most of the generated images show characteristic speckle artifacts similar to water droplets . Pictured 1 Shown , Even though the droplet may not be obvious in the final image , It also appears in the generator's intermediate feature graph ( See bottom 1). Exceptions begin to appear in 64×64 Near resolution , And appear in all feature maps , And it becomes stronger and stronger at higher resolution . This persistent artifact is puzzling , Because the discriminator should be able to detect it .

        The problem is AdaIN On , This operation can normalize the mean and variance of each characteristic graph respectively , He will destroy the information found in the characteristic graph of each order of magnitude , The droplet artifact is the result of the generator deliberately normalizing the signal strength through the example , Powerful local spikes that can dominate Statistics , The generator can scale information as effectively as anywhere else , When removing normalization from the generator , The drop artifacts will disappear completely . In a word , Normalization results in droplet artifacts , The reason may be that the normalization at each layer destroys the signal .

2.1 generator architecture revisited

chart 2. We redesigned StyleGAN Architecture of synthetic networks .(a) original StyleGAN, among A From W The affine transformation of learning , Generate style vectors , and B Indicates noise broadcast operation .(b) Figures with the same complete details . ad locum , We will AdaIN It is decomposed into explicit normalization and then modulated , Then operate the mean and standard deviation of each characteristic graph . We also weigh learning (w), deviation (b) And constant input (c) Annotated , And redraw the gray box , So that each box activates a style . Activation function (Leaky ReLU) Always apply immediately after adding offset .(c) We made some changes to the original architecture , These changes are valid in the text . We removed some superfluous operations from the beginning , take b and B The addition of moves out of the valid area of the style , And only adjust the standard deviation of each element diagram .(d) The modified architecture enables us to use “ demodulation ” Operation instead of instance normalization , This operation applies to the weights associated with each convolution layer .

The evolution of the above set of diagrams is very important , The first one stylegan Original drawing of ,2b It's right stylegan Detailed illustration of the generator of ,AdaIN Be broken down into two parts , Normalization and mediation , The original stylegan Bias and noise are applied within the style block , That is, in the figure above b and B, They are conv after , Before normalization , Move these operations out of the style box , Operate on UN standardized data , You can also get prediction results , also , After changing again , Only the standard deviation can be standardized and modulated , No need. mean The mean is modulated . Such as in the figure above c Shown .

2.2 Instance normalization revisited

        Or the picture above , Droplet phenomenon is produced by normalization , We have improved the generator architecture , Yes b and B Has been moved , Removed mean value , But normalization still exists . And instance normalization is very important , How to relax a style while retaining the effect of a specific scale ?style mixing yes stylegan The ability to control a generated graph is an important ability ,style mixing By putting different latent w When reasoning, it is sent to different layers , In practice , Style modulation can enlarge some feature images by an order of magnitude or more . In order to make style mixing Play a role , We must explicitly offset this amplification on a per sample basis . We can also simply delete normalization , However, it is difficult to control the specific effect . Actually lost stylegan Controllability of . Now? , We propose an alternative approach , This method removes artifacts while preserving control . It's in the picture above d This way, . In fact, the method of weight re unification is adopted .

So this is actually where ,stylegan2 It's over ,stylegan1 and stylegan2 The style is similar , There are still many discussions and experimental details , But overall ,stylegan1 There are two core points ,latent w And hierarchical input ,stylegan2 The core is to solve stylegan1 Artifact phenomenon in , Further understand the generator architecture and redesign , It's really more elegant .

3.Image quality and generator smoothness

         Even though GAN Metrics ( for example FID or Precision and Recall(P&R)) Successfully captured many aspects of the generator , But they are still in the blind spot of image quality . We think , The key to obvious inconsistency lies in the specific selection of feature space , instead of FID or P&R The basis of . The recent discovery , Use ImageNet [35] Trained classifiers tend to base their decisions more on texture than shape [11], Humans, on the other hand, have a strong focus on shape [28]. This makes sense in our context , because FID and P&R Use from InceptionV3 [39] and VGG-16 [39] Advanced features of , These characteristics are trained in this way , Therefore, it can be expected to favor texture detection . such , Images with strong cat textures may look more similar to each other , Better than the details that human observers care about , This in part undermines density based metrics (FID) And multifaceted coverage metrics (P&R). 

We observed the perceived image quality and the perceived path length (PPL) Interesting relationship between [24], The index is initially measured by measuring the average between images generated under small disturbances in the potential space LPIPS Distance to quantify the smoothness of the mapping from the potential space to the output image [49]. Refer to figure again 13 and 14, smaller PPL( Smooth generator mapping ) Seems to be related to higher overall image quality , Other indicators do not see this change . chart 4 It is tested that the correlation is closer .

I think mmgeneration in stylegan The evaluation indicators of the series are also 3 individual ,FID,PR and PPL.

4.progressive growing revisited

        StyleGAN The use of Progressive growth There will be some shortcomings , Here's the picture , When the face deflects to the left and right , The teeth are not deflected , That is, some details of the face, such as teeth 、 Eyes and other positions are relatively fixed , It doesn't change according to the deflection of the face , This phenomenon is caused by the adoption of Progressive growth Training ,Progressive growth Is to train low resolution first , When the training is stable , Add a higher level of resolution for training , Increase the resolution after the training is stable , That is, each resolution will output the result , This leads to higher output frequency details , See the teeth in the figure below , And ignore the change of movement .

Use Progressive growth The reason is that the network required for high-resolution image generation is relatively large and deep , It's not easy to train when the network is too deep , however skip connection It can solve the training of deep network , Therefore, there are three network structures in the figure below , Have adopted skip connection, The effects of the three network structures are also experimentally evaluated , Here's the picture below .

  From indicators ppl Start , The following is the discussion of training strategy and practice , It's worth seeing if there's time in the future . 

原网站

版权声明
本文为[Kun Li]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/174/202206231508002846.html