当前位置:网站首页>stylegan2:analyzing and improving the image quality of stylegan
stylegan2:analyzing and improving the image quality of stylegan
2022-06-23 16:16:00 【Kun Li】
StyleGAN and StyleGAN2 Deep understanding of - You know StyleGAN The paper :A Style-Based Generator Architecture for Generative Adversarial Networks Source code :https://github.com/NVlabs/stylegan effect : Face generation effect The generated fake car effect : The resulting fake bedroom effect : Effect video ( Suggest …
https://zhuanlan.zhihu.com/p/263554045【 Thesis translation 】StyleGAN2_bupt_gwy The blog of -CSDN Blog _stylegan2 The paper The translation is for reference only ! The original is pdf, If you want to download, you can stamp :http://www.gwylab.com/pdf/stylegan2_chs.pdfhttps://blog.csdn.net/a312863063/article/details/103550022stylegan2 The core is to solve stylegan1 Drop artifact problem left behind ,stylegan1 There are two core points ,latent w And hierarchical input ,stylegan2 The core is to solve stylegan1 Artifact phenomenon in , Further understand the generator architecture and redesign , It's really more elegant .
abstract:
Explore and analyze the causes of some feature artifacts , The normalization method of generator is redesigned , Reexamine progressive growing, The generator is regularized , From latent code Better quality is encouraged in mapping to images .
1.introduction
StyleGAN Its remarkable feature is its unconventional generator architecture . Map the network f No more potential codes entered z Feed only to the beginning of the network , Instead, it is translated into intermediate potential coding w. Then the affine transformation generates the style of the control layer , And normalized by adaptive examples (AdaIN) Participate in synthetic networks g To synthesize . in addition , Random variation is facilitated by providing additional random noise maps to the composite network . That's right stylegan An overview of , Basically up to standard .
Many observers have noticed that StyleGAN [3] The feature artifact in the generated image . We identified two causes of these artifacts , The architecture of eliminating these artifacts and the improved training method are described . First , We studied the origin of the common speckle artifacts , And found that the generator created them to avoid design defects in its architecture . In the 2 In the festival , We redesigned the normalization used in the generator , This normalization eliminates artifacts . secondly , We analyze the artifacts associated with incremental growth [23], The artifact is stable at high resolution GAN Very successful in training . We propose an alternative design that achieves the same goal - That is, training starts from focusing on low resolution images , Then gradually shift the focus to higher and higher resolution - Without changing the network topology during training . This new design also enables us to reasonably understand the effective resolution of the generated image , This resolution is actually lower than expected , This leads to the increase of network capacity ( The first 4 section ). This is the core of this article , solve stylegan1 The artifact problem left behind .
Quantitative analysis of the image quality of the generated network is a challenging direction ,Frechet Initial distance (FID) Measured inceptionv3 The density difference between the two distributions in the high-dimensional feature space of the classifier . But recently it has been proved that FID Based on classifier network , Change the network to focus on texture rather than shape , These metrics can not accurately capture the image quality ,PPL Metric is a method to estimate the quality of potential spatial interpolation .

2.removing normalization artifacts
We first observed that StyleGAN Most of the generated images show characteristic speckle artifacts similar to water droplets . Pictured 1 Shown , Even though the droplet may not be obvious in the final image , It also appears in the generator's intermediate feature graph ( See bottom 1). Exceptions begin to appear in 64×64 Near resolution , And appear in all feature maps , And it becomes stronger and stronger at higher resolution . This persistent artifact is puzzling , Because the discriminator should be able to detect it .
The problem is AdaIN On , This operation can normalize the mean and variance of each characteristic graph respectively , He will destroy the information found in the characteristic graph of each order of magnitude , The droplet artifact is the result of the generator deliberately normalizing the signal strength through the example , Powerful local spikes that can dominate Statistics , The generator can scale information as effectively as anywhere else , When removing normalization from the generator , The drop artifacts will disappear completely . In a word , Normalization results in droplet artifacts , The reason may be that the normalization at each layer destroys the signal .
2.1 generator architecture revisited

chart 2. We redesigned StyleGAN Architecture of synthetic networks .(a) original StyleGAN, among A From W The affine transformation of learning , Generate style vectors , and B Indicates noise broadcast operation .(b) Figures with the same complete details . ad locum , We will AdaIN It is decomposed into explicit normalization and then modulated , Then operate the mean and standard deviation of each characteristic graph . We also weigh learning (w), deviation (b) And constant input (c) Annotated , And redraw the gray box , So that each box activates a style . Activation function (Leaky ReLU) Always apply immediately after adding offset .(c) We made some changes to the original architecture , These changes are valid in the text . We removed some superfluous operations from the beginning , take b and B The addition of moves out of the valid area of the style , And only adjust the standard deviation of each element diagram .(d) The modified architecture enables us to use “ demodulation ” Operation instead of instance normalization , This operation applies to the weights associated with each convolution layer .
The evolution of the above set of diagrams is very important , The first one stylegan Original drawing of ,2b It's right stylegan Detailed illustration of the generator of ,AdaIN Be broken down into two parts , Normalization and mediation , The original stylegan Bias and noise are applied within the style block , That is, in the figure above b and B, They are conv after , Before normalization , Move these operations out of the style box , Operate on UN standardized data , You can also get prediction results , also , After changing again , Only the standard deviation can be standardized and modulated , No need. mean The mean is modulated . Such as in the figure above c Shown .
2.2 Instance normalization revisited
Or the picture above , Droplet phenomenon is produced by normalization , We have improved the generator architecture , Yes b and B Has been moved , Removed mean value , But normalization still exists . And instance normalization is very important , How to relax a style while retaining the effect of a specific scale ?style mixing yes stylegan The ability to control a generated graph is an important ability ,style mixing By putting different latent w When reasoning, it is sent to different layers , In practice , Style modulation can enlarge some feature images by an order of magnitude or more . In order to make style mixing Play a role , We must explicitly offset this amplification on a per sample basis . We can also simply delete normalization , However, it is difficult to control the specific effect . Actually lost stylegan Controllability of . Now? , We propose an alternative approach , This method removes artifacts while preserving control . It's in the picture above d This way, . In fact, the method of weight re unification is adopted .

So this is actually where ,stylegan2 It's over ,stylegan1 and stylegan2 The style is similar , There are still many discussions and experimental details , But overall ,stylegan1 There are two core points ,latent w And hierarchical input ,stylegan2 The core is to solve stylegan1 Artifact phenomenon in , Further understand the generator architecture and redesign , It's really more elegant .
3.Image quality and generator smoothness
Even though GAN Metrics ( for example FID or Precision and Recall(P&R)) Successfully captured many aspects of the generator , But they are still in the blind spot of image quality . We think , The key to obvious inconsistency lies in the specific selection of feature space , instead of FID or P&R The basis of . The recent discovery , Use ImageNet [35] Trained classifiers tend to base their decisions more on texture than shape [11], Humans, on the other hand, have a strong focus on shape [28]. This makes sense in our context , because FID and P&R Use from InceptionV3 [39] and VGG-16 [39] Advanced features of , These characteristics are trained in this way , Therefore, it can be expected to favor texture detection . such , Images with strong cat textures may look more similar to each other , Better than the details that human observers care about , This in part undermines density based metrics (FID) And multifaceted coverage metrics (P&R).

We observed the perceived image quality and the perceived path length (PPL) Interesting relationship between [24], The index is initially measured by measuring the average between images generated under small disturbances in the potential space LPIPS Distance to quantify the smoothness of the mapping from the potential space to the output image [49]. Refer to figure again 13 and 14, smaller PPL( Smooth generator mapping ) Seems to be related to higher overall image quality , Other indicators do not see this change . chart 4 It is tested that the correlation is closer .
I think mmgeneration in stylegan The evaluation indicators of the series are also 3 individual ,FID,PR and PPL.
4.progressive growing revisited
StyleGAN The use of Progressive growth There will be some shortcomings , Here's the picture , When the face deflects to the left and right , The teeth are not deflected , That is, some details of the face, such as teeth 、 Eyes and other positions are relatively fixed , It doesn't change according to the deflection of the face , This phenomenon is caused by the adoption of Progressive growth Training ,Progressive growth Is to train low resolution first , When the training is stable , Add a higher level of resolution for training , Increase the resolution after the training is stable , That is, each resolution will output the result , This leads to higher output frequency details , See the teeth in the figure below , And ignore the change of movement .

Use Progressive growth The reason is that the network required for high-resolution image generation is relatively large and deep , It's not easy to train when the network is too deep , however skip connection It can solve the training of deep network , Therefore, there are three network structures in the figure below , Have adopted skip connection, The effects of the three network structures are also experimentally evaluated , Here's the picture below .

From indicators ppl Start , The following is the discussion of training strategy and practice , It's worth seeing if there's time in the future .
边栏推荐
- R语言使用colorblinr包模拟色盲视觉、将已有的ggplot2可视化结果图像使用edit_colors函数编辑转化为色盲视觉友好的可视化结果、并自定设置色盲形式、色盲严重级别
- How to quickly respond to changing production management needs?
- window远程桌面连接互传文件加速小技巧
- [tcapulusdb knowledge base] Introduction to tmonitor background one click installation (II)
- 医学影像分割的网站
- 股票开户如何便宜一些?现在网上开户安全么?
- MySQL中json_extract函数说明
- [tcapulusdb knowledge base] Introduction to tmonitor background one click installation (I)
- R语言plotly可视化:plotly可视化在对比条形图中添加误差条(Bar Chart with Error Bars with plotly in R)
- 阻塞、非阻塞、多路复用、同步、异步、BIO、NIO、AIO 一文搞定
猜你喜欢

golang二分查找法代码实现

How did Tencent's technology bulls complete the overall cloud launch?

进阶开发阶段-势若悬丝的加粗开始. 现在的一小步,明年的一大步

Golang对JSON文件的写操作

Block, non block, multiplexing, synchronous, asynchronous, bio, NiO, AIO

matlab: 如何从一些数据里知道是由哪些数据相加得出一个已知数

创新实力再获认可!腾讯安全MSS获2022年度云原生安全守护先锋

【TcaplusDB知识库】Tmonitor后台一键安装介绍(一)

uniapp对接腾讯即时通讯TIM 发图片消息问题
NLP 论文领读|改善意图识别的语义表示:有监督预训练中的各向同性正则化方法
随机推荐
SSRS页面配置Postgresql data source的方法
【历史上的今天】6 月 23 日:图灵诞生日;互联网奠基人出生;Reddit 上线
Understand the classic buck-boost negative voltage circuit
创建好后的模型,对Con2d, ConvTranspose2d ,以及归一化BatchNorm2d函数中的变量进行初始化
R语言使用timeROC包计算无竞争情况下的生存资料多时间AUC值、使用cox模型、并添加协变量、可视化无竞争情况下的生存资料多时间ROC曲线
golang数据类型图
批量注册组件
Batch registration component
Build vscode into an invincible IDE (14) tasks JSON and launch JSON configuration details, add automation tasks at will
JSON in MySQL_ Extract function description
matlab: 如何从一些数据里知道是由哪些数据相加得出一个已知数
How to quickly respond to changing production management needs?
线上交流丨可信机器学习之机器学习与知识推理相结合(青源Talk第20期 李博)
Analysis of TCP three-time handshake and four-time handshake
Advanced development - generic entry basic class test
R语言plotly可视化:plotly可视化在对比条形图中添加误差条(Bar Chart with Error Bars with plotly in R)
怎样快速的应对变动的生产管理需求?
数组自带的方法
[tcapulusdb knowledge base] Introduction to new models of tcapulusdb
Uniapp sends picture messages to Tencent instant messaging Tim