当前位置:网站首页>Understanding disentangling in β- VAE paper reading notes

Understanding disentangling in β- VAE paper reading notes

2022-07-06 18:46:00 zeronose


Preface

article :Understanding disentangling in β-VAE
Link to the original text : link
Understanding disentangling in β-VAE Is based on β-VAE An article from .
First ,β-VAE There are several problems in :
1.β-VAE Just by KL Item adds a super parameter β, It is found that the model has decoupling characteristics , But there is no good explanation for adding a super parameter β Will produce decoupling characteristics .
2.β-VAE Find out , When the decoupling effect is good, the reconstruction effect is not good , When the reconstruction effect is good, the decoupling effect is poor , So we need to balance decoupling and reconstruction .
Based on this ,Understanding disentangling in β-VAE adopt Information bottleneck theory given β-VAE Explanation of decoupling , And for β-VAE We need to balance decoupling and reconstruction , They put forward their own training methods ---- Gradually increase the amount of information of potential variables in the training process .
The original text also introduces VAE And β-VAE, It's just more here , Interested can see my previous article
VAE
β-VAE


One 、 What is the information bottleneck ?

 Insert picture description here
The figure below is the explanation of the information bottleneck in the original text , In fact, information bottleneck describes a constrained optimization goal , The goal is to maximize potential bottlenecks Z And tasks Y Mutual information between , At the same time, discard the input X About Y All irrelevant information .
Drawing is troublesome , Let's make do with it , As shown in the figure below
 Insert picture description here
You can see β-VAE The loss function of is :
L(θ, ϕ, β; x, z) = Eqϕ(z|x)ln pθ(x|z)− βKL (qϕ(z|x) || p(z))
The first term on the right of the equation is the reconstruction term , The second term is the regular term . The second is the information bottleneck of the first , Increase the weight of the second term, that is β Value , Also is to let qϕ(z|x) Closer to the p(z), because p(z) It's the standard Zhengtai distribution , At this time, the implicit variables are limited z It contains x The amount of information , Therefore, the decoupling effect is good, but the reconstruction effect is poor , On the contrary, the decoupling effect is poor , The effect of refactoring is good .

Two 、 New training goals

1. Loss function

 Insert picture description here
among ,γ Fixed to a larger number 1000,C Is a variable number . In the process of training ,C Gradually increase from zero to a value large enough to produce high-quality reconstruction .

The training process here is similar to β-VAE Different ,β-VAE During training , It needs to be fixed first β Value then train , change β You need to retrain after the value . there C Is a variable parameter , It also becomes the amount of information , He is in the process of training , from 0 Gradually increasing .


summary

By controlling the increase of potential posterior coding ability in the training process , Allow the previous average KL The difference increases gradually from zero , Not the original β-VAE Fixed in the target β weighting KL Increase of items . Compared with the result of the original formula , It promotes Robust learning of disentangled representation , Combined with better reconstruction fidelity .

原网站

版权声明
本文为[zeronose]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202131255497393.html