当前位置:网站首页>Tsinghua & Zhiyuan | cogview2: faster and better text image generation model
Tsinghua & Zhiyuan | cogview2: faster and better text image generation model
2022-06-27 01:13:00 【Zhiyuan community】

The title of the paper :CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers(arxiv)
The work of the team of vice president Tang Jie of Zhiyuan , First author Dingming , It is the latest development of the enlightenment model . stay Reddit Get on A lot of attention .GitHub There is already 500 Multi star .
Abstract
be based on Transformer The development of text to image model , The slow generation and complexity of high-resolution images . In this paper , We propose a method based on layering Transformer And local parallel autoregressive generation . We pre trained a with a simple and flexible self supervised task 60 Billion parameter Transformer Model —— Cross modal common language model (CogLM) , And fine tune it to achieve fast super-resolution . Compared with the most advanced DALL·E 2 comparison , New text to image system CogView2 Show very competitive generation , And it naturally supports interactive text guided editing of images .
The last part of the paper is very interesting :
Autoregression or diffusion ? Even though GPT Great success in text generation , But diffusion model is becoming more and more popular in image generation . We compare the diffusion model with the autoregressive model in terms of speed , This is the first 1 The biggest drawback of the autoregressive model discussed in section . Under the same architecture , The diffusion model needs more FLOP, But it has a high degree of parallelism . They can also make a trade-off between quality and time consumption by manually arranging the sampling step . for example ,Glide [19] sampling 250 A diffusion step is evaluated , as well as 27 Steps for interactive sampling , This reduces the delay to 15 second .
The autoregressive model must generate images one by one , But our LoPAR The image can be upsampled with high parallelism , therefore ( Potentially ) We can design the model by introducing more hierarchies , Thus, the time cost can be reduced faster than the diffusion model .DALL-E-2 and CogView2 Comparison . DALL·E 2 [27] Is a recently released for use in 1024 × 1024 The parallel work of generating text to image on resolution . Although its probabilistic model and architecture are similar to CogView2 There's a big difference , But both have the same spirit —— Hierarchical generation .CogView2 Can be based on DALL-E2 A limited demonstration of compositing similar scenes , for example “ Lion teacher ”( chart 1) And “ Panda scientists ”(DALL·E 2), Even though CogView2 Only trained. DALL·E 2 Of the total data used 5% about . And CogView2 comparison ,DALL·E 2 The main difference between the three-level super-resolution and “ zero ” Level image prior generation . Because training a three-level super-resolution is very resource consuming , And it is more engineering oriented , We leave it to future work .
Code : https://github.com/THUDM/CogView2
Students who want to experiment may want to pay attention to , This model has high hardware requirements , recommend NVIDIA A100 machine .
边栏推荐
- Memcached Foundation
- xml学习笔记
- Topolvm: kubernetes local persistence scheme based on LVM, capacity aware, dynamically create PV, and easily use local disk
- Implementation of ARP module in LwIP
- CH423要如何使用,便宜的国产IO扩展芯片
- Custom jsp[if, foreach, data, select] tag
- Interface test framework practice (I) | requests and interface request construction
- 数据库面试题+sql语句解析
- 可视化介绍 Matplotlib 和 Plotnine
- Central Limit Theorem
猜你喜欢
随机推荐
NLP:Transformer在NLP自然语言领域的简介(预训练技术)、NLP模型发展(ELmo/GPT/BERT/MT-DNN/XLNet/RoBERTa/ALBERT)、经典案例之详细攻略
30 MySQL tutorial MySQL storage engine overview
memcached基础6
One click acceleration of Sony camera SD card file copy operation, file operation batch processing tutorial
buuctf-pwn write-ups (6)
Object access mechanism and others
xml学习笔记
Memcached foundation 6
At present, which securities company is the best and safest to open an account for stock speculation?
Basic introduction to C program structure Preview
Lambda expression
Visual introduction to Matplotlib and plotnine
ArcGIS 镶嵌数据集切片丢失问题处理
Summary of working at home during the epidemic | community essay solicitation
How to measure the thickness of glass substrate by spectral confocal
一键加速索尼相机SD卡文件的复制操作,文件操作批处理教程
Topolvm: kubernetes local persistence scheme based on LVM, capacity aware, dynamically create PV, and easily use local disk
使用NetworkX对社交网络进行系统的分析:Facebook网络分析案例
memcached基础4
USB协议中HID设备描述符以及键盘按键值对应编码表
![寻找旋转排序数组中的最小值 II[经典抽象二分 + 如何破局左中右三者相等]](/img/75/05d5765588dfde971167fbc72e2aa8.png)








