当前位置:网站首页>Party, Google's autoregressive Wensheng graph model
Party, Google's autoregressive Wensheng graph model
2022-06-24 13:16:00 【Zhiyuan community】
We introduce the Pathways Autoregressive Text-to-Image model (Parti), an autoregressive text-to-image generation model that achieves high-fidelity photorealistic image generation and supports content-rich synthesis involving complex compositions and world knowledge. Recent advances with diffusion models for text-to-image generation, such as Google’s Imagen, have also shown impressive capabilities and state-of-the-art performance on research benchmarks. Parti and Imagen are complementary in exploring two different families of generative models – autoregressive and diffusion, respectively – opening exciting opportunities for combinations of these two powerful models.
Parti treats text-to-image generation as a sequence-to-sequence modeling problem, analogous to machine translation – this allows it to benefit from advances in large language models, especially capabilities that are unlocked by scaling data and model sizes. In this case, the target outputs are sequences of image tokens instead of text tokens in another language. Parti uses the powerful image tokenizer, ViT-VQGAN, to encode images as sequences of discrete tokens, and takes advantage of its ability to reconstruct such image token sequences as high quality, visually diverse images.
We observed the following results:
- Consistent quality improvements by scaling Parti’s encoder-decoder up to 20 billion parameters.
- State-of-the-art zero-shot FID score of 7.23 and finetuned FID score of 3.22 on MS-COCO.
- Effectiveness across a wide variety of categories and difficulty aspects in our analysis on Localized Narratives and PartiPrompts, our new holistic benchmark of 1600+ English prompts that we release as part of this work.
We also explore and highlight limitations of our models, giving key example areas of focus for further improvements.
边栏推荐
- YOLOv6:又快又准的目标检测框架开源啦
- 用一个软件纪念自己故去的母亲,这或许才是程序员最大的浪漫吧
- 实现领域驱动设计 - 使用ABP框架 - 创建实体
- Summary of the process of restoring damaged data in MySQL database
- Attack popular science: DDoS
- Ask a question about SQL view
- What if the WordPress website forgets its password
- Encapsulate the method of converting a picture file object to Base64
- Implement Domain Driven Design - use ABP framework - update operational entities
- Babbitt | metauniverse daily must read: 618 scores have been announced. How much contribution has the digital collection made behind this satisfactory answer
猜你喜欢

使用 Abp.Zero 搭建第三方登录模块(一):原理篇

MySQL foreign key impact

1、贪吃蛇游戏设计

WPF from zero to 1 tutorial details, suitable for novices on the road

我真傻,招了一堆只会“谷歌”的程序员!

爱可可AI前沿推介(6.24)

Pycharm中使用Terminal激活conda服务(终极方法,铁定可以)
![[data mining] final review (sample questions + a few knowledge points)](/img/90/a7b1cc2063784fb53bb89b29ede5de.png)
[data mining] final review (sample questions + a few knowledge points)

Kubernetes cluster deployment

关于被黑数据库那些事
随机推荐
It's settled! Bank retail credit risk control just does it!
The 35 "top 100 counties" of tmall 618 agricultural products come from the central and western regions and Northeast China
一文讲透植物内生菌研究怎么做 | 微生物专题
[day ui] affix component learning
What if the WordPress website forgets its password
谁是鱼谁是饵?红队视角下蜜罐识别方式汇总
Creation and use of unified links in Huawei applinking
我从根上解决了微信占用手机内存问题
IOMMU (VII) -vfio and mdev
短信服务sms
YOLOv6:又快又准的目标检测框架开源啦
The text to voice function is available online. You can experience the services of professional broadcasters. We sincerely invite you to try it out
Configure Yum proxy
关于被黑数据库那些事
On the value foam of digital copyright works from the controversial nature of "Meng Hua Lu"
Optimization of MP4 file missing seconds caused by TS files when downloading videos from easydss video platform
Who said that "programmers are useless without computers? The big brother around me disagrees! It's true
Mlife forum | microbiome and data mining
强化学习之父Richard Sutton论文:追寻智能决策者的通用模型
系统测试主要步骤