当前位置:网站首页>Attention mechanism yyds, AI editor finally bid farewell to P and destroyed the whole picture

Attention mechanism yyds, AI editor finally bid farewell to P and destroyed the whole picture

2022-06-26 07:31:00 QbitAl

Abundant color From the Aofei temple
qubits | official account QbitAI

“Attention is all you need!”

This famous saying has been confirmed in new fields .

come from Shenzhen University And the latest achievements of Tel Aviv University , By means of GAN Introduction in Attention mechanism , Successfully solved some problems when editing face “ Shake your hands ” problem

For example, when changing a person's hairstyle background Mess up ;

6b86e84b53e126cd5e02ae2f72492f56.png

Add beard time Affect the hair 、 even to the extent that Whole face It doesn't look like the same person anymore :

0a076b3e2efd95cd7552e999b8211f0f.png

This new model with attention mechanism , Clear and refreshing when modifying the image , Not at all Outside the target area Any impact .

8567dc15d657cc65d33bf0f873853256.png

How to do that ?

Introduce attention map

This model is called FEAT (Face Editing with Attention), It's in StyleGAN Based on the generator , Introduce attention mechanism .

Specifically, use StyleGAN2 Face editing in a hidden space .

Its mapper (Mapper) Build on previous methods , By learning the offset of latent space (offset) To modify the image .

In order to modify only the target area ,FEAT Here we introduce Pay attention to the picture  (attention map), The features obtained by the source latent code are fused with the features of the shift latent code .

b41fa603fed388c8b321fe8c465cfab7.png

To guide the editor , The model also introduces CLIP, It can use text to learn the offset and generate an attention map .

FEAT The specific process is as follows :

574cf2faaa67bc7883ebb2e8c97ffb00.png

First , Given a sheet with n A feature image . As shown in the figure above , Light blue represents the characteristics , The yellow part marks the number of channels .

Then under the guidance of text prompts , For all that can predict the corresponding offset (offset) The style code of (style code) Generate mapper .

The mapper is biased by latent code (wj+ Δj) modify , Generate Map image .

We'll go on with the , Generated with the attention module attention map take Original image and mapped image Of the i Layer features are fused , Generate the editing effect we want .

among , The architecture of the attention module is as follows :

The left side is for feature extraction StyleGAN2 generator , On the right is... For making attention map Attention Network.

53ae37c80e6d808c5e2e510033ea6fc1.png

Do not modify the image outside the target area

In the experimental comparison link , The researchers will first FEAT Compared with two recently proposed text-based operation models :TediGAN and StyleCLIP.

among TediGAN Encode both images and text into StyleGAN In the latent space ,StyleCLIP Then three kinds of CLIP And StyleGAN Combined technology .

fb3f2276a063ef856e6aadbdf8d17e51.png

You can see ,FEAT The precise control of the face is realized , No impact outside the target area .

and TediGAN Not only did you not succeed in changing your hairstyle , Also put Skin colour Darken 了 ( The first line is the rightmost ).

In the second group of changes in expression , also hold sex other It's changed ( The second line is the rightmost ).

3ac9f3eedff0ba393339311e4f19643f.png

StyleCLIP Overall effect ratio TediGAN Much better , But the price is to become A messy background  ( The third column in the last two pictures , The background of each effect is affected ).

And then FEAT And InterFaceGAN and StyleFlow Compare .

among InterfaceGAN stay GAN Perform linear operation in latent space , and StyleFlow Then the nonlinear editing path is extracted from the latent space .

give the result as follows :

ec88bbeda5273c074d57be04009f88c3.png

This is a group of bearded editors , You can see InterfaceGAN and StyleFlow In addition to this operation, it is necessary to Hair and eyebrows Made minor changes .

besides , Both methods also need to mark data for supervision , Can not be like FEAT Do the same Zero sample operation .

stay Quantitative experiments in ,FEAT It also shows its advantages .

In the editing results of the five attributes ,FEAT Than TediGAN and StyleCLIP In visual quality (FID score ) And feature retention (CS and ED score ) Better performance in .

aa6c2dbb69df1a094dfc3fc5582f3309.png

About author

One work Hou Xianxu From Shenzhen University .

43296f7e8c8dfcd4ac27ad43db883fce.png

He graduated from China University of mining and technology, majoring in geography and geology , Ph.D. in computer science from the University of Nottingham , His main research interests are computer vision and deep learning .

The corresponding author is Shen Linlin , Master supervisor of pattern recognition and intelligent system major of Shenzhen University , At present, the research direction is face / The fingerprint / Biometrics such as palmprint 、 Medical image processing 、 Pattern recognition system .

He graduated from Shanghai Jiaotong University with a master's degree in Applied Electronics , Doctor also graduated from the University of Nottingham . The number of Google academic citations has reached 7936 Time .

Address of thesis :
https://arxiv.org/abs/2202.02713

原网站

版权声明
本文为[QbitAl]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202171108412502.html