当前位置:网站首页>Dry goods!Generative Model Evaluation and Diagnosis
Dry goods!Generative Model Evaluation and Diagnosis
2022-08-05 09:13:00 【AITIME on Taoism】
点击蓝字
关注我们
AI TIME欢迎每一位AI爱好者的加入!
The great success of deep generative models requires quantitative tools to measure their statistical performance.Divergence frontiers Recently proposed as an evaluation framework for generative models,Because they are able to measure the quality-versus-variety trade-off inherent in deep generative models.
我们对divergence frontiersA non-asymptotic analysis of the sample complexity of .我们还引入了frontier integral,它提供了divergence frontier的 summary statistic.
We show such as Good-Turing 或 Krichevsky-Trofimov How smooth estimators such as ,This results in a faster convergence rate.We validate our theoretical results with data from natural language processing and computer vision.
本期AI TIME PhD直播间,We invite doctoral students from the University of Washington in the United States——刘浪,为我们带来报告分享《Evaluation and diagnosis of generative models》.
刘浪:
PhD candidate in Statistics at the University of Washington,The main research direction is the application of optimal transmission distance and information divergence in generative models and statistical inference.
Image and Text Generation
in the generative model,There are two more common failure scenarios.One is the generated dataHigh quality but low variety,That is, each generated face is very similar to a real person,But they are very similar to each other.另一种情况是Low quality but high variety,That is, looking at each picture may not feel very realistic,But it can form onediverse的数据集.
Type I and Type II Costs in Generative Modeling
To describe these two failure scenarios,We introduce two types of losses.
其中,Left image above,The blue area represents the distribution of the real data,The orange oval represents the distribution of the generated data.for each generated data point,If it is not on the real data distribution,It means that these points are not so realistic,Also known as a class of losses.对于上图中的右图,If the real data is not on the distribution of the generated data,It is called the second-class loss,Because they must not be generated by our model.
There are two types of losses,We can then use it to describe previous failure cases.比如High quality ,It means that most of the generated data points are on the real data distribution,That is, the loss of one class is relatively small.如果是low variety,It means that the generated data can only be usedcoverA small subset of real data,That is, the second-class loss is relatively large.
The next question is how to quantify these two types of losses?假设P是真实数据分布,Q是生成数据分布,那么KL(Q||P)会在Q大PIt was bigger when I was young,So it can be used as a quantification of the first type of loss,而KL(P||Q)Used to quantify the second type of loss.
There are also hidden problems,如果Q和P没有相同的support,Then the result could be infinity.
Divergence Frontiers for Generative Models
为了解决上述问题,We consider a mixture distributionR,比如P<<r and="" q<<r<="" span="">.
之后,我们可以用KL(Q||R)和KL(P||R)替代KL(Q||P)和KL(P||Q)to quantify the loss.
在选择R上,We can consider the following optimization problem,We consider a linear combination of the two classes of losses,然后选择RMinimize the result of this linear combination.
After having the above theoretical basis,We can consider to defineDivergence Frontiers,如下图所示:
如果我们考虑P和Q的线性组合,The above curve can be obtained.其他Rare all above the curve,The concept of this curve is calledDivergence Frontiers.
而在本研究中,The problem we want to solve is to estimate with data Divergence Frontiers的时候,How can we do better?Can we make a theoretical characterization of the accuracy of the estimate.
Statistical Summary of Divergence Frontiers
第一步,Let's come up with one firstsummary statistic,Consider a linear combination of two types of losses.
Then for linear combinationscost做积分.We call this integral Frontier integral.
● FI(P, Q) = FI(Q, P) and FI(P, Q) = 0 iff P = Q.
● Frontier integralThe value of will be there[0,1]
Estimation Procedure of Divergence Frontiers
Let's take a look at the actual work next,How researchers estimate itDivergence Frontiers的.
首先,假设Pis a continuous distribution,然后经quantizationProcessing makes it split into different oneskgroup and getQuantized Distribution.接下来,We will use data processing to getEmpirical Estimator.整个过程之中,We asked three questions in the diagram above:
1. How to select the quantization level k?
2. Can we do better than the naïve empirical estimator?
3. How many data are needed to achieve a good accuracy?
接下来,We will answer the above three questions.
Main Results
● Theorem (Statistical Error)
Assume our distributionP和Q都是离散的,they fall
个点上.validity of this assumption,That's what we've done beforequantization操作.在这种情况下,我们可以证明,在至少1 − δ概率时,有以下公式成立:
如上图所示,If this distribution has longer tail,Then there may also be some in the tailmass是不会出现在sample中的.This poses a great challenge to our analysis,This also requires us to think separatelymissing mass.If we don't consider it alone,得到的rate就会更慢.
● Theorem (Total Error)
The second result is for arbitrary distributionsP和Q,可以是连续或离散的.对于任意一个正整数k,We can prove that there must be onepartition Sk使得以下公式成立:
我们选择k,Minimize the upper limit of the left-hand side of the above equation,That is what we recommend theoreticallyk选择.This answers the question1.同时,The upper bound on the right-hand side of the inequality can also be used as the target accuracy to answer the question3.
● Theorem (Smoothed Distribution Estimators)
Add-constant estimator:
对于每个a,We'll count thisaHow many times it appears in our sample,Then add a small constantb.对于这样的estimator,It can be shown that a better upper bound exists.
如果我们取b=1/2,我们每一个a都会有positive的mass,这样missing mass'problem will not exist.This is why we can get a better upper bound.
Experimental Results
接下来,Let's look at two sets of experiments.
目标:Investigate smoothed distribution estimators on image and text data.
● Train a StyleGAN on CIFAR-10.
● Train a GPT-2 on Wikitext-103.
下图中,We compared three different onesdistribution estimators.
可以看到在4种情况下,smoothed distribution estimatorswill indeed perform better.
在第二组实验中,recommended by our researchThe quantization level k ∝ n^{1/3}Whether it also performs better in practice.
为了解决这个问题,We chose two two-dimensional continuous distributions,选择的quantization level是正比于k∝n1/r.r可以从2取到5,发现r=3performed best when ,And this is exactly what we recommendk的选法.
提
醒
论文题目:
Divergence Frontiers for Generative Models: Sample Complexity, Quantization Effects, and Frontier Integrals
论文链接:
https://arxiv.org/abs/2106.07898
点击“阅读原文”,即可观看本场回放
整理:林 则
作者:刘 浪
往期精彩文章推荐
记得关注我们呀!每天都有新知识!
关于AI TIME
AI TIME源起于2019年,旨在发扬科学思辨精神,邀请各界人士对人工智能理论、算法和场景应用的本质问题进行探索,加强思想碰撞,链接全球AI学者、行业专家和爱好者,希望以辩论的形式,探讨人工智能和人类未来之间的矛盾,探索人工智能领域的未来.
迄今为止,AI TIME已经邀请了700多位海内外讲者,举办了逾300场活动,超260万人次观看.
我知道你
在看
哦
~
点击 阅读原文 查看回放!
边栏推荐
猜你喜欢
交换机端口的三种类型详解与hybrid端口实验
百行代码发射红心,程序员何愁命不中女朋友!
Overall design and implementation of Kubernetes-based microservice project
放大器OPA855的噪声计算实例
Three solutions to solve cross-domain in egg framework
Detailed explanation of DNS query principle
嵌入式实操----基于RT1170 移植memtester做SDRAM测试(二十五)
Xcode10的打包方式distribute app和启动项目报错以及Xcode 打包本地ipa包安装到手机上
Concurrent CAS
工程制图知识点
随机推荐
leetcode 剑指 Offer 10- I. 斐波那契数列
国际原子能机构总干事称乌克兰扎波罗热核电站安全形势堪忧
Luogu P1966: [NOIP2013 提高组] 火柴排队 [树状数组+逆序对]
Moonbeam团队发布针对整数截断漏洞的紧急安全修复
【LeetCode】623. 在二叉树中增加一行
How to make a puzzle in PS, self-study PS software photoshop2022, PS make a puzzle effect
十一道家常小菜详细攻略[图文并茂]
(转)[Json]net.sf.json 和org.json 的差别及用法
Spark cluster deployment (third bullet)
CROS and JSONP configuration
【无标题】目录
Why do I recommend using smart async?
百行代码发射红心,程序员何愁命不中女朋友!
The color of life divine
Comprehensively explain what is the essential difference between GET and POST requests?Turns out I always misunderstood
js 图形操作一(兼容pc、移动端实现 draggable属性 拖放效果)
tensorflow.keras cannot introduce layers
六年团队Leader实战秘诀|程序员最重要的八种软技能 - 脸皮薄容易耽误事 - 自我营销
The toss of MM before going to the street (interesting)
DPU — 功能特性 — 存储系统的硬件卸载