当前位置:网站首页>2022 open source work of the latest text generated image research (papers with code)
2022 open source work of the latest text generated image research (papers with code)
2022-07-27 02:08:00 【Medium coke with ice】
Papers with code
- 1、DALL-E 2
- 2、Recurrent Affine Transformation for Text-to-image Synthesis
- 3、Vector Quantized Diffusion Model for Text-to-Image Synthesis
- 4、Autoregressive Image Generation using Residual Quantization
- 5、LAFITE
- 6、DF-GAN
- 7、Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
- 8、DALL-Eval:Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers,
This blog will briefly introduce some open source text generated image research work , It's basically 2022 The latest research results in :
1、DALL-E 2
《Hierarchical Text-Conditional Image Generation with CLIP Latents》
OpenAI The latest job of , At present, it is text to image SOTA
The paper :https://cdn.openai.com/papers/dall-e-2.pdf
Code :https://github.com/lucidrains/DALLE2-pytorch( unofficial )
2、Recurrent Affine Transformation for Text-to-image Synthesis
《Recurrent Affine Transformation for Text-to-image Synthesis》
A recursive affine transformation for generating countermeasure networks is proposed (RAT), Connect all fusion blocks with recurrent neural networks , To simulate their long-term dependencies , Follow DF-GAN Is very similar .
The paper :https://arxiv.org/pdf/2204.10482.pdf
Code :https://github.com/senmaoy/Recurrent-Affine-Transformation-for-Text-to-image-Synthesis
3、Vector Quantized Diffusion Model for Text-to-Image Synthesis
《Vector Quantized Diffusion Model for Text-to-Image Synthesis》
For the first time, vector quantization is diffused (VQ-Diffusion) The model is used for text to image generation , As before based on GAN Text to image method ,VQ-Diffusion It can deal with more complex scenes and greatly improve the quality of synthetic images .
meeting :CVPR 2022
The paper :https://arxiv.org/abs/2111.14822
Code :https://github.com/microsoft/vq-diffusion
4、Autoregressive Image Generation using Residual Quantization
《Autoregressive Image Generation using Residual Quantization》
Quantized by residuals VAE (RQ-VAE) and RQ-Transformer A two-stage framework is formed to generate high-resolution images .RQ-VAE It can accurately approximate the feature map of the image , And the image is represented as a stack of discrete codes . then ,RQ-Transformer Learn to predict the quantization feature vector of the next position by predicting the next code stack .
meeting :CVPR 2022
The paper :https://arxiv.org/abs/2203.01941
Code :https://github.com/kakaobrain/rq-vae-transformer
5、LAFITE
《LAFITE: Towards Language-Free Training for Text-to-Image Generation》
It is the first time to train the text to image generation model without any text data , Take advantage of strong pre training CLIP Model .
meeting :CVPR 2022
The paper :https://arxiv.org/abs/2111.13792
Code :https://github.com/drboog/Lafite
6、DF-GAN
《DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis》
Abandon the tradition GAN Stackable structure , It adopts a single-stage trunk , A novel deep text image fusion block is introduced into the generator , Contains the structure of affine blocks , The discriminator introduces the matching perception gradient penalty and one-way output .
meeting :CVPR 2022
The paper :https://arxiv.org/abs/2008.05865
Code :https://github.com/tobran/DF-GAN
intensive reading :https://blog.csdn.net/air__Heaven/article/details/124288473
7、Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
《Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors》
Work in progress , Introduced several new features :(i) Scene editing ,(ii) Text editing with anchor scene ,(iii) Overcome distributed text prompts , as well as (iv) Story illustration generation ( That is, illustrations are generated from stories )
The paper :https://arxiv.org/abs/2203.13131
Code :https://github.com/CasualGANPapers/Make-A-Scene
8、DALL-Eval:Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers,
《Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers》
The reasoning ability and social prejudice of text to image generation converter are studied . The first measures four visual reasoning skills : Object recognition 、 Object count 、 Color recognition and spatial relationship understanding . Put forward PaintSkills Diagnostic data sets and assessment kits , Used to measure these four visual reasoning skills . second , Image caption based on pre training 、 Image text retrieval and image classification model to measure the text alignment and quality of the generated image . Third , The social bias in the model is evaluated
The paper :https://arxiv.org/abs/2202.04053
Code :https://github.com/j-min/DallEval
边栏推荐
- MySQL installation
- CEPH (distributed storage)
- Mysql数据库-面试题
- [polymorphism] the detailed introduction of polymorphism is simple and easy to understand
- Text to image论文精读SSA-GAN:基于语义空间感知的文本图像生成 Text to Image Generation with Semantic-Spatial Aware GAN
- MySQL view
- Introduction to network - Introduction to home networking & basic network knowledge
- 测开基础 日常刷题 (持续更新ing...)
- 指针常量与常量指针详细讲解
- Complete super detailed introduction to transactions in MySQL
猜你喜欢

Enumerated valueof() method stepping on the pit

Introduction to network - Introduction to home networking & basic network knowledge

事务数据库及其四特性,原理,隔离级别,脏读,幻读,不可重复读?

a元素的伪类

Difference between fat AP and thin AP & advantages and disadvantages of networking

MVCC及其原理详解

2022最新抖音直播监控(二)直播间流媒体下载
![[reprint] GPU compute capability table](/img/7c/87be1131f52f21fe080c36f0834467.png)
[reprint] GPU compute capability table

测开基础 日常刷题 (持续更新ing...)

uuid和索引建立规则
随机推荐
7.8 锐捷网络笔试
ACM模式输入输出练习
Flink1.13.6 detailed deployment method
mysql优化概论
Application of load balancing
Virtualization technology KVM
索引失效原理讲解及其常见情况
Connect mysql detailed graphic operations in ECs docker (all)
[详解C语言]一文带你玩转选择(分支)结构
MySQL backup recovery
MySQL index
[cann training camp] enter media data processing 1
7.13 蔚来提前批笔试
详解文本生成图像的仿射变换模块(Affine Transformation)和条件批量标准化(CBN)
Project | implement a high concurrency memory pool
MySQL single table query exercise
Pseudo class of a element
Introduction to network - Introduction to home networking & basic network knowledge
三种能有效融合文本和图像信息的方法——特征拼接、跨模态注意、条件批量归一化
Text to image论文精读DF-GAN:A Simple and Effective Baseline for Text-to-Image Synthesis一种简单有效的文本生成图像基准模型