当前位置:网站首页>Text driven for creating and editing images (with source code)
Text driven for creating and editing images (with source code)
2022-06-11 16:32:00 【Computer Vision Research Institute】
Pay attention to the parallel stars
Never get lost
Institute of computer vision



official account ID|ComputerVisionGzq
Study Group | Scan the code to get the join mode on the homepage
Address of thesis :https://arxiv.org/pdf/2206.02779.pdf
Computer Vision Institute column
author :Edison_G
Great progress in neural image generation , Coupled with the appearance of the seemingly omnipotent visual language model , Finally, the text-based interface can be used to create and edit images .
1
Generalization
Processing generic images requires a diverse underlying generation model , So the latest work uses the diffusion model , This proved to be more diverse than GAN. However , A major drawback of diffusion models is that their reasoning time is relatively slow .

In today's sharing , Researchers have proposed an accelerated solution for the local text driven editing task of general-purpose images , The required edits are limited to user supplied masks . The researchers' solution takes advantage of the recent text to image potential diffusion model (LDM), The model accelerates diffusion by running in a low dimensional potential space .

First transform by blending diffusion into LDM To the local image editor . Next , In view of this LDM The inherent problem of not being able to accurately reconstruct an image , A solution based on optimization is proposed . Last , The researchers solved the scenario of using a thin mask to perform local editing . Evaluate the newly proposed method qualitatively and quantitatively based on the available baseline , And prove that in addition to being faster , The new method achieves better accuracy than baseline while reducing some artifacts .
Project page address :https://omriavrahami.com/blended-latent-diffusion-page
2
New framework method analysis
Blended Latent Diffusion This paper aims to provide a solution for the local text driven editing task of general-purpose images introduced in the mixed diffusion paper .Blended Diffusion The reasoning time is slow ( In a single GPU It takes about... To get good results on 25 minute ) And pixel level artifacts .
To solve these problems , Researchers propose to incorporate mixed diffusion into a potential diffusion model from text to image . To do this , Operate on potential spaces , And repeatedly mix the foreground and background parts in the potential space , The diffusion process is as follows :

Operating in potential space does enjoy fast reasoning speed , But it has imperfect reconstruction of unshielded area and can not deal with thin mask . More details on how we can solve these problems , Please continue reading .



Noise artifacts
Given input image (a) and mask(b) And guide text “ Blond curls ”, And the newly proposed method (d) comparison , Mixed diffusion will produce obvious pixel level noise artifacts (c).
As mentioned earlier , Potential diffusion can generate images from a given text ( Text to image LDM). However , The model lacks the ability to edit existing images locally , Therefore, the researchers suggest merging blending diffusion from text to image LDM.
The new method is summarized in the first chapter , Description of the algorithm , Please read the original paper .LDM In variational automatic encoder VAE = (𝐸(𝑥), 𝐷(𝑧)) Perform text guided de-noising diffusion in the potential space of learning . Take the part we want to modify as the foreground (fg), Use the rest as the background (bg), Follow the idea of mixed diffusion , And repeatedly mix the two parts in this potential space , As the diffusion proceeds . Use VAE Encoder 𝑧init ∼ 𝐸(𝑥) The input image 𝑥 Code into potential space . Potential space still has a spatial dimension ( because VAE Convolution property of ), But the width and height are smaller than the input image (8 times ).
therefore , The input mask 𝑚 Down sampling to these spatial dimensions , To get the potential space mask 𝑚latent, It will be used to perform blending .

Background reconstruction comparison

Background reconstruction using decoder weights finetuning

Thin mask progression

Progressive mask shrinking

Comparison to baselines: A comparison with (1) Local CLIP-guided diffusion [Crowson 2021], (2) PaintByWord++ [Bau et al. 2021; Crowson et al. 2022], (3) Blended Diffusion [Avrahami et al. 2021], (4) GLIDE [Nichol et al. 2021] and (5) GLIDE-filtered [Nichol et al. 2021].

Limit : Top line , be based on CLIP Our ranking only considers mask Area , So sometimes the results are only piecewise realistic , The overall image does not look realistic . Bottom line : The model has text bias - It might try to create movie posters with text / Book cover , Or in addition to generating actual objects .
THE END
Please contact the official account for authorization.

The learning group of computer vision research institute is waiting for you to join !
ABOUT
Institute of computer vision
The Institute of computer vision is mainly involved in the field of deep learning , Mainly devoted to face detection 、 Face recognition , Multi target detection 、 Target tracking 、 Image segmentation and other research directions . The Research Institute will continue to share the latest paper algorithm new framework , The difference of our reform this time is , We need to focus on ” Research “. After that, we will share the practice process for the corresponding fields , Let us really experience the real scene of getting rid of the theory , Develop the habit of hands-on programming and brain thinking !
VX:2311123606

Previous recommendation
In recent years, several good papers have implemented the code ( With source code download )
Based on hierarchical self - supervised learning, vision Transformer Scale to gigapixel images
YOLOS: Rethink through target detection Transformer( With source code )
Fast YOLO: For real-time embedded target detection ( Attached thesis download )
边栏推荐
- Interview high frequency algorithm question --- longest palindrome substring
- 回归预测 | MATLAB实现RBF径向基神经网络多输入单输出
- Laravel listening mode
- 2022年R1快开门式压力容器操作考试题库及模拟考试
- jdbc调试错误,求指导
- 【剑指Offer】22.链表中倒数第K节点
- laravel 监听模式
- Heartless sword English Chinese bilingual poem 001 Spring outing
- CLP Jinxin helps Rizhao bank put into operation its new financial ecological network
- cocoapod只更新指定库(不更新索引)
猜你喜欢

真香,华为主动离职也给 N+1

Pytest测试框架基础篇

2022年R1快开门式压力容器操作考试题库及模拟考试

Pyqt5 enables the qplaintextedit control to support line number display

If you want to learn ArrayList well, it is enough to read this article
![Complete test process [Hangzhou multi tester] [Hangzhou multi tester \wang Sir]](/img/f7/d9bdd667e6e34b99940b9c2ecac061.png)
Complete test process [Hangzhou multi tester] [Hangzhou multi tester \wang Sir]

Heartless sword English Chinese bilingual poem 001 Spring outing

laravel 2020-01-01T00:00:00.000000Z 日期转化

完整的测试流程【杭州多测师】【杭州多测师_王sir】

What is a generic? Why use generics? How do I use generics? What about packaging?
随机推荐
Simulated 100 questions and simulated examination for main principals of hazardous chemical business units in 2022
JVM 的组成
Leetcode 1974. Minimum time to type words using a special typewriter (yes, once)
MSDN download win11 method, simple and easy to operate
2022年R1快开门式压力容器操作考试题库及模拟考试
真香,华为主动离职也给 N+1
leetcode463. Perimeter of the island (simple)
Laravel8 implementation of sign in function
laravel 监听模式
开关电源电路图及原理12v分析-详细版
用户界面之工具栏详解-AutoRunner自动化测试工具
RDKit 安装
TC8:UDP_ MessageFormat_ 01-02
2022 safety officer-a certificate test question simulation test question bank simulation test platform operation
PyQt5 使QPlainTextEdit控件支持行号显示
Laravel 2020-01-01t00:00:00.000000z date conversion
C starts an external EXE file and passes in parameters
Question ac: Horse Vaulting in Chinese chess
485天,我远程办公的 21 条心得分享|社区征文
如何优化 Compose 的性能?通过「底层原理」寻找答案 | 开发者说·DTalk