当前位置:网站首页>Tsinghua & Zhiyuan | cogview2: faster and better text image generation model
Tsinghua & Zhiyuan | cogview2: faster and better text image generation model
2022-06-27 01:13:00 【Zhiyuan community】

The title of the paper :CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers(arxiv)
The work of the team of vice president Tang Jie of Zhiyuan , First author Dingming , It is the latest development of the enlightenment model . stay Reddit Get on A lot of attention .GitHub There is already 500 Multi star .
Abstract
be based on Transformer The development of text to image model , The slow generation and complexity of high-resolution images . In this paper , We propose a method based on layering Transformer And local parallel autoregressive generation . We pre trained a with a simple and flexible self supervised task 60 Billion parameter Transformer Model —— Cross modal common language model (CogLM) , And fine tune it to achieve fast super-resolution . Compared with the most advanced DALL·E 2 comparison , New text to image system CogView2 Show very competitive generation , And it naturally supports interactive text guided editing of images .
The last part of the paper is very interesting :
Autoregression or diffusion ? Even though GPT Great success in text generation , But diffusion model is becoming more and more popular in image generation . We compare the diffusion model with the autoregressive model in terms of speed , This is the first 1 The biggest drawback of the autoregressive model discussed in section . Under the same architecture , The diffusion model needs more FLOP, But it has a high degree of parallelism . They can also make a trade-off between quality and time consumption by manually arranging the sampling step . for example ,Glide [19] sampling 250 A diffusion step is evaluated , as well as 27 Steps for interactive sampling , This reduces the delay to 15 second .
The autoregressive model must generate images one by one , But our LoPAR The image can be upsampled with high parallelism , therefore ( Potentially ) We can design the model by introducing more hierarchies , Thus, the time cost can be reduced faster than the diffusion model .DALL-E-2 and CogView2 Comparison . DALL·E 2 [27] Is a recently released for use in 1024 × 1024 The parallel work of generating text to image on resolution . Although its probabilistic model and architecture are similar to CogView2 There's a big difference , But both have the same spirit —— Hierarchical generation .CogView2 Can be based on DALL-E2 A limited demonstration of compositing similar scenes , for example “ Lion teacher ”( chart 1) And “ Panda scientists ”(DALL·E 2), Even though CogView2 Only trained. DALL·E 2 Of the total data used 5% about . And CogView2 comparison ,DALL·E 2 The main difference between the three-level super-resolution and “ zero ” Level image prior generation . Because training a three-level super-resolution is very resource consuming , And it is more engineering oriented , We leave it to future work .
Code : https://github.com/THUDM/CogView2
Students who want to experiment may want to pay attention to , This model has high hardware requirements , recommend NVIDIA A100 machine .
边栏推荐
- Central Limit Theorem
- flutter系列之:flutter中的flow
- LeetCode 142. 环形链表 II
- leetcode 1143. Longest Commom Subsequence 最长公共子序列(中等)
- Flink practical problems (VII): no watermark (watermarks are only available eventtime is used)
- 美团:踩雷好几年,才总结出的数据治理避坑攻略
- memcached基础6
- Kept to implement redis autofailover (redisha) 14
- memcached基础4
- Custom class loader encrypts and decrypts classes
猜你喜欢

Other service registration and discovery

3-wire SPI screen driving mode

解决unable to create a folder to save the sketch: mkdir sketch

buuctf-pwn write-ups (6)

Generate flow chart with code, and how to use markdown

IIS deploy static web site and FTP service

滑环安装有哪些技巧和方法

About Random Numbers

ML:机器学习工程化之团队十大角色背景、职责、产出物划分之详细攻略

建模规范:环境设置
随机推荐
memcached基础
Gaussian and Summary Stats
What are the skills and methods for slip ring installation
Account management, database building and four engines + cases of MySQL
How to control the quality of HD slip ring in the production process
XSS笔记(下)
Central Limit Theorem
One click acceleration of Sony camera SD card file copy operation, file operation batch processing tutorial
ESP32实验-自建web服务器配网02
解决unable to create a folder to save the sketch: mkdir sketch
如何把老式键盘转换成USB键盘并且自己编程?
2022年地理信息系统与遥感专业就业前景与升学高校排名选择
Other service registration and discovery
Review the old and know the new -- constant renewal at normal temperature
memcached基础6
统一结果集的封装
At present, which securities company is the best and safest to open an account for stock speculation?
Memcached foundation 1
Unable to create a folder to save the sketch: MKDIR sketch
Record a bug caused by a line break