当前位置:网站首页>Party, Google's autoregressive Wensheng graph model
Party, Google's autoregressive Wensheng graph model
2022-06-24 13:16:00 【Zhiyuan community】
We introduce the Pathways Autoregressive Text-to-Image model (Parti), an autoregressive text-to-image generation model that achieves high-fidelity photorealistic image generation and supports content-rich synthesis involving complex compositions and world knowledge. Recent advances with diffusion models for text-to-image generation, such as Google’s Imagen, have also shown impressive capabilities and state-of-the-art performance on research benchmarks. Parti and Imagen are complementary in exploring two different families of generative models – autoregressive and diffusion, respectively – opening exciting opportunities for combinations of these two powerful models.
Parti treats text-to-image generation as a sequence-to-sequence modeling problem, analogous to machine translation – this allows it to benefit from advances in large language models, especially capabilities that are unlocked by scaling data and model sizes. In this case, the target outputs are sequences of image tokens instead of text tokens in another language. Parti uses the powerful image tokenizer, ViT-VQGAN, to encode images as sequences of discrete tokens, and takes advantage of its ability to reconstruct such image token sequences as high quality, visually diverse images.
We observed the following results:
- Consistent quality improvements by scaling Parti’s encoder-decoder up to 20 billion parameters.
- State-of-the-art zero-shot FID score of 7.23 and finetuned FID score of 3.22 on MS-COCO.
- Effectiveness across a wide variety of categories and difficulty aspects in our analysis on Localized Narratives and PartiPrompts, our new holistic benchmark of 1600+ English prompts that we release as part of this work.
We also explore and highlight limitations of our models, giving key example areas of focus for further improvements.
边栏推荐
- Yolov6: the fast and accurate target detection framework is open source
- 105. 简易聊天室8:使用 Socket 传递图片
- 1、贪吃蛇游戏设计
- The text to voice function is available online. You can experience the services of professional broadcasters. We sincerely invite you to try it out
- It's settled! Bank retail credit risk control just does it!
- 初中级开发如何有效减少自身的工作量?
- 16 safety suggestions from metamask project to solid programmers
- [2022 national tournament simulation] BigBen -- determinant, Du Jiao sieve
- Pycharm中使用Terminal激活conda服务(终极方法,铁定可以)
- Teach you how to use airtestide to connect your mobile phone wirelessly!
猜你喜欢

Parti,谷歌的自回归文生图模型

Use abp Zero builds a third-party login module (I): Principles

Getting started with the go Cobra command line tool

手把手教你用AirtestIDE无线连接手机!

一文讲透研发效能!您关心的问题都在

go Cobra命令行工具入门

CVPR 2022 - Interpretation of selected papers of meituan technical team

Dingding, Feishu, and enterprise wechat: different business approaches

【数据库】期末复习(计科版)

《回归故里》阅读笔记
随机推荐
DTU上报的数据值无法通过腾讯云规则引擎填入腾讯云数据库中
Kubernetes practical skill: entering container netns
16 safety suggestions from metamask project to solid programmers
Who said that "programmers are useless without computers? The big brother around me disagrees! It's true
How to efficiently analyze online log
Attack Science: ARP attack
1、贪吃蛇游戏设计
Pycharm中使用Terminal激活conda服务(终极方法,铁定可以)
The data value reported by DTU cannot be filled into Tencent cloud database through Tencent cloud rule engine
系统测试主要步骤
mLife Forum | 微生物组和数据挖掘
Use abp Zero builds a third-party login module (I): Principles
[live broadcast of celebrities] elastic observability workshop
Opengauss kernel: simple query execution
[data mining] final review (sample questions + a few knowledge points)
脚本之美│VBS 入门交互实战
申请MIMIC数据库失败怎么办?从失败到成功的经验分享给你~
The text to voice function is available online. You can experience the services of professional broadcasters. We sincerely invite you to try it out
如何高效的分析online.log
“有趣” 是新时代的竞争力