当前位置:网站首页>Interpretation of Dagan paper
Interpretation of Dagan paper
2022-07-06 19:30:00 【‘Atlas’】
List of articles
The paper : 《Depth-Aware Generative Adversarial Network for Talking Head Video Generation》
github: https://github.com/harlanhong/CVPR2022-DaGAN
solve the problem
Existing problems :
Existing video generation schemes mainly use 2D characterization , Face 3D Information is actually critical to this task , Then note that it costs a lot ;
resolvent :
The author of this paper proposes a self-monitoring scheme , Automatically generate dense from face video 3D Geometric information , No need for any annotation data ; Based on this information , Further estimate the sparse face key points , Used to capture important movements of the head ; Depth information is also used for learning 3D Cross modal ( Appearance and depth )attention Mechanism , Guide the generation of a sports field used to distort the original image ;
What this article puts forward DaGAN It can generate highly realistic faces , And it has achieved good results on faces that have never been seen before ;
The contributions of this paper mainly include the following three points :
1、 Introduce self-monitoring method to fit depth map from video , And use it to improve the generation effect ;
2、 Propose novel and deep concerns GAN, Depth guided facial key point estimation and cross modal ( Depth and image )attention Mechanism , Introduce depth information into the generation network ;
3、 Full experiments show the accurate depth fitting of face images , At the same time, the production effect exceeds SOTA;
Algorithm
DaGAN The method is shown in the figure 2, It is composed of generator and discriminator ;
The generator consists of three parts :
1、 Self supervised deep information learning sub network F d F_d Fd, Self supervised learning depth estimation from two consecutive frames in video ; Then fix F d F_d Fd Conduct the whole network training ;
2、 Depth information guided sparse key detection sub network F k p F_{kp} Fkp;
3、 The feature distortion module uses key points to generate change regions , It has combined appearance information with motion information by distorting source image features , Get distorted features F w F_w Fw; To ensure that the model pays attention to details and facial microexpressions , Learn more about paying attention to in-depth information attention map, Its refinement F w F_w Fw obtain F g F_g Fg, Used to generate images I g I_g Ig;
Self supervision Face Depth Learning
Author's reference SfM-Learner, Make optimization , Use consecutive frames I i + 1 I_{i+1} Ii+1 As the source diagram and I i I_i Ii As a target diagram , Learn set elements , Depth map D I i D_{I_i} DIi, Similar internal parameter matrix K I i − > I i + 1 K_{ {I_i}->I_{i+1}} KIi−>Ii+1, Related camera attitude R I i − > I i + 1 R_{ {I_i}->I_{i+1}} RIi−>Ii+1 And transformation t I i − > I i + 1 t_{ {I_i}->I_{i+1}} tIi−>Ii+1, And SfM-Learner The difference is the camera internal parameters K Need to learn ;
Flow chart 3:
1、 F d F_d Fd Extract the target graph I i I_i Ii Depth map of D I i D_{I_i} DIi;
2、 F p F_p Fp Extract learnable parameters R 、 t 、 K R、t、K R、t、K;
3、 According to the equation 3、4 Add source map I i + 1 I_{i+1} Ii+1 Get by geometric transformation I i ′ I'_i Ii′
q k q_k qk Represents the source map I i + 1 I_{i+1} Ii+1 Distorted pixels on ;
p j p_j pj Represents the target graph I i I_i Ii Previous pixel ;
Loss function P e P_e Pe Such as the type 5 Shown , Use L1 Loss and SSIM Loss
Sparse key motion modeling
1、 take RGB And F d F_d Fd Extract the depth map for concat;
2、 Through the key point estimation module F k p F_{kp} Fkp Get face sparse keys , Such as the type 6, Due to the introduction of depth map , Make the prediction key points more accurate ;
Feature distortion strategy , Pictured 4
1、 Such as the type 7, Calculate the initial offset between the original graph and the driving graph O n {O_n} On;
2、 Generate 2D coordinate map z;
3、 take O be applied to z, Get the motion area w m w_m wm;
4、 Use w m w_m wm Distort the downsampled image to get the initial distorted feature image ;
5、 Occlusion estimator τ \tau τ Predict the motion flow through the distorted characteristic graph mask M m M_m Mm And occlusion diagram M o M_o Mo;
6、 Use M m M_m Mm Distortion I s I_s Is Through the encoder ϵ I \epsilon_I ϵI Obtained appearance feature diagram , With the M o M_o Mo Fusion generation F w F_w Fw, Such as the type 8. F w F_w Fw It not only retains the original image information, but also extracts the motion information between two faces .
Cross modal attention Mechanism
In order to effectively use the learned depth map to improve the generation ability , The author proposes cross modal attention Mechanism , Pictured 5.
1、 Through the depth encoder ϵ d \epsilon_d ϵd Extract depth map D s z D_{sz} Dsz Characteristics of figure F d F_d Fd;
2、 Through three separate 1X1 The convolution layer will F d F_d Fd、 F w F_w Fw It maps to 3 Hidden feature layer F q F_q Fq、 F k F_k Fk、 F v F_v Fv;
3、 Such as the type 9, adopt attention Generate F g F_g Fg.
4、 Refined by decoder F g F_g Fg Generate the final image I g I_g Ig.
Training
During the training process, the original diagram and the driving diagram are the same , The loss function is as follows 10,
L P L_P LP For perceived loss ;
L G L_G LG Use the lowest double loss ;
L E L_E LE Equivariant loss , Ensure that the original image is transformed , The key points are transformed accordingly ;
L D L_D LD Loss through distance , Prevent facial keys from gathering ;
experiment
SOTA Methods to compare
stay VoxCeleb1 Data set with SOTA The comparison test results are shown in table 1、2
stay VoxCeleb1 On dataset , The effect of cross identity reproduction is shown in the figure 6
stay CelebV On dataset , And SOTA Method comparison test is shown in table 3, The effect of cross identity reproduction is shown in the figure 7
Ablation Experiment
FDN: Facial depth network ;
CAM: Cross modal attention Mechanism
Results such as table 4,
The generation effect is shown in the figure 8
DaGAN Effect video
Conclusion
DaGAN Use self-monitoring method to learn facial depth map , On the one hand, it is used for more accurate facial key point estimation ; On the other hand, design cross modal ( Depth map and RGB) Mechanism to obtain micro expression changes . therefore DaGAN Produce more realistic and natural results .
边栏推荐
- DaGAN论文解读
- USB host driver - UVC swap
- ACTF 2022圆满落幕,0ops战队二连冠!!
- Application of clock wheel in RPC
- 反射及在运用过程中出现的IllegalAccessException异常
- CPU负载很低,loadavg很高处理方法
- Sanmian ant financial successfully got the offer, and has experience in Android development agency recruitment and interview
- Characteristic colleges and universities, jointly build Netease Industrial College
- First day of rhcsa study
- [translation] linkerd's adoption rate in Europe and North America exceeded istio, with an increase of 118% in 2021.
猜你喜欢
Take a look at how cabloyjs workflow engine implements activiti boundary events
How to type multiple spaces when editing CSDN articles
php+redis实现超时取消订单功能
面试突击63:MySQL 中如何去重?
MySQL information schema learning (II) -- InnoDB table
Carte de réflexion + code source + notes + projet, saut d'octets + jd + 360 + tri des questions d'entrevue Netease
Countdown 2 days | live broadcast preview of Tencent cloud message queue data import platform
Sanmian ant financial successfully got the offer, and has experience in Android development agency recruitment and interview
Pytorch common loss function
思维导图+源代码+笔记+项目,字节跳动+京东+360+网易面试题整理
随机推荐
思维导图+源代码+笔记+项目,字节跳动+京东+360+网易面试题整理
业务与应用同步发展:应用现代化的策略建议
[translation] Digital insider. Selection process of kubecon + cloudnativecon in Europe in 2022
潇洒郎: AttributeError: partially initialized module ‘cv2‘ has no attribute ‘gapi_wip_gst_GStreamerPipe
PMP每日一练 | 考试不迷路-7.6
JDBC详解
全套教学资料,阿里快手拼多多等7家大厂Android面试真题
凤凰架构2——访问远程服务
今日直播 | “人玑协同 未来已来”2022弘玑生态伙伴大会蓄势待发
The list of people who passed the fifth phase of personal ability certification assessment was published
Mysql Information Schema 學習(一)--通用錶
时钟轮在 RPC 中的应用
LeetCode-1279. Traffic light intersection
php+redis实现超时取消订单功能
Pychrm Community Edition calls matplotlib pyplot. Solution of imshow() function image not popping up
10 schemes to ensure interface data security
关于图像的读取及处理等
数学知识——高斯消元(初等行变换解方程组)代码实现
ModuleNotFoundError: No module named ‘PIL‘解决方法
MRO工业品企业采购系统:如何精细化采购协同管理?想要升级的工业品企业必看!