当前位置:网站首页>Interpretation of Dagan paper
Interpretation of Dagan paper
2022-07-06 19:30:00 【‘Atlas’】
List of articles
The paper : 《Depth-Aware Generative Adversarial Network for Talking Head Video Generation》
github: https://github.com/harlanhong/CVPR2022-DaGAN
solve the problem
Existing problems :
Existing video generation schemes mainly use 2D characterization , Face 3D Information is actually critical to this task , Then note that it costs a lot ;
resolvent :
The author of this paper proposes a self-monitoring scheme , Automatically generate dense from face video 3D Geometric information , No need for any annotation data ; Based on this information , Further estimate the sparse face key points , Used to capture important movements of the head ; Depth information is also used for learning 3D Cross modal ( Appearance and depth )attention Mechanism , Guide the generation of a sports field used to distort the original image ;
What this article puts forward DaGAN It can generate highly realistic faces , And it has achieved good results on faces that have never been seen before ;
The contributions of this paper mainly include the following three points :
1、 Introduce self-monitoring method to fit depth map from video , And use it to improve the generation effect ;
2、 Propose novel and deep concerns GAN, Depth guided facial key point estimation and cross modal ( Depth and image )attention Mechanism , Introduce depth information into the generation network ;
3、 Full experiments show the accurate depth fitting of face images , At the same time, the production effect exceeds SOTA;
Algorithm
DaGAN The method is shown in the figure 2, It is composed of generator and discriminator ;
The generator consists of three parts :
1、 Self supervised deep information learning sub network F d F_d Fd, Self supervised learning depth estimation from two consecutive frames in video ; Then fix F d F_d Fd Conduct the whole network training ;
2、 Depth information guided sparse key detection sub network F k p F_{kp} Fkp;
3、 The feature distortion module uses key points to generate change regions , It has combined appearance information with motion information by distorting source image features , Get distorted features F w F_w Fw; To ensure that the model pays attention to details and facial microexpressions , Learn more about paying attention to in-depth information attention map, Its refinement F w F_w Fw obtain F g F_g Fg, Used to generate images I g I_g Ig;
Self supervision Face Depth Learning
Author's reference SfM-Learner, Make optimization , Use consecutive frames I i + 1 I_{i+1} Ii+1 As the source diagram and I i I_i Ii As a target diagram , Learn set elements , Depth map D I i D_{I_i} DIi, Similar internal parameter matrix K I i − > I i + 1 K_{ {I_i}->I_{i+1}} KIi−>Ii+1, Related camera attitude R I i − > I i + 1 R_{ {I_i}->I_{i+1}} RIi−>Ii+1 And transformation t I i − > I i + 1 t_{ {I_i}->I_{i+1}} tIi−>Ii+1, And SfM-Learner The difference is the camera internal parameters K Need to learn ;
Flow chart 3:
1、 F d F_d Fd Extract the target graph I i I_i Ii Depth map of D I i D_{I_i} DIi;
2、 F p F_p Fp Extract learnable parameters R 、 t 、 K R、t、K R、t、K;
3、 According to the equation 3、4 Add source map I i + 1 I_{i+1} Ii+1 Get by geometric transformation I i ′ I'_i Ii′
q k q_k qk Represents the source map I i + 1 I_{i+1} Ii+1 Distorted pixels on ;
p j p_j pj Represents the target graph I i I_i Ii Previous pixel ;
Loss function P e P_e Pe Such as the type 5 Shown , Use L1 Loss and SSIM Loss
Sparse key motion modeling
1、 take RGB And F d F_d Fd Extract the depth map for concat;
2、 Through the key point estimation module F k p F_{kp} Fkp Get face sparse keys , Such as the type 6, Due to the introduction of depth map , Make the prediction key points more accurate ;
Feature distortion strategy , Pictured 4
1、 Such as the type 7, Calculate the initial offset between the original graph and the driving graph O n {O_n} On;
2、 Generate 2D coordinate map z;
3、 take O be applied to z, Get the motion area w m w_m wm;
4、 Use w m w_m wm Distort the downsampled image to get the initial distorted feature image ;
5、 Occlusion estimator τ \tau τ Predict the motion flow through the distorted characteristic graph mask M m M_m Mm And occlusion diagram M o M_o Mo;
6、 Use M m M_m Mm Distortion I s I_s Is Through the encoder ϵ I \epsilon_I ϵI Obtained appearance feature diagram , With the M o M_o Mo Fusion generation F w F_w Fw, Such as the type 8. F w F_w Fw It not only retains the original image information, but also extracts the motion information between two faces .
Cross modal attention Mechanism
In order to effectively use the learned depth map to improve the generation ability , The author proposes cross modal attention Mechanism , Pictured 5.
1、 Through the depth encoder ϵ d \epsilon_d ϵd Extract depth map D s z D_{sz} Dsz Characteristics of figure F d F_d Fd;
2、 Through three separate 1X1 The convolution layer will F d F_d Fd、 F w F_w Fw It maps to 3 Hidden feature layer F q F_q Fq、 F k F_k Fk、 F v F_v Fv;
3、 Such as the type 9, adopt attention Generate F g F_g Fg.
4、 Refined by decoder F g F_g Fg Generate the final image I g I_g Ig.
Training
During the training process, the original diagram and the driving diagram are the same , The loss function is as follows 10,
L P L_P LP For perceived loss ;
L G L_G LG Use the lowest double loss ;
L E L_E LE Equivariant loss , Ensure that the original image is transformed , The key points are transformed accordingly ;
L D L_D LD Loss through distance , Prevent facial keys from gathering ;
experiment
SOTA Methods to compare
stay VoxCeleb1 Data set with SOTA The comparison test results are shown in table 1、2
stay VoxCeleb1 On dataset , The effect of cross identity reproduction is shown in the figure 6
stay CelebV On dataset , And SOTA Method comparison test is shown in table 3, The effect of cross identity reproduction is shown in the figure 7
Ablation Experiment
FDN: Facial depth network ;
CAM: Cross modal attention Mechanism
Results such as table 4,
The generation effect is shown in the figure 8
DaGAN Effect video
Conclusion
DaGAN Use self-monitoring method to learn facial depth map , On the one hand, it is used for more accurate facial key point estimation ; On the other hand, design cross modal ( Depth map and RGB) Mechanism to obtain micro expression changes . therefore DaGAN Produce more realistic and natural results .
边栏推荐
- A full set of teaching materials, real questions of Android interview of 7 major manufacturers including Alibaba Kwai pinduoduo
- 学习探索-无缝轮播图
- Dom 操作
- MySQL information schema learning (I) -- general table
- C # use Marshall to manually create unmanaged memory in the heap and use
- 史上超级详细,想找工作的你还不看这份资料就晚了
- Carte de réflexion + code source + notes + projet, saut d'octets + jd + 360 + tri des questions d'entrevue Netease
- 时钟轮在 RPC 中的应用
- Tensorflow2.0 自定义训练的方式求解函数系数
- Fast power template for inverse element, the role of inverse element and example [the 20th summer competition of Shanghai University Programming League] permutation counting
猜你喜欢
Interview assault 63: how to remove duplication in MySQL?
在解决了 2961 个用户反馈后,我做出了这样的改变...
Low CPU load and high loadavg processing method
Xingnuochi technology's IPO was terminated: it was planned to raise 350million yuan, with an annual revenue of 367million yuan
CCNP Part 11 BGP (III) (essence)
Systematic and detailed explanation of redis operation hash type data (with source code analysis and test results)
In depth analysis, Android interview real problem analysis is popular all over the network
快速幂模板求逆元,逆元的作用以及例题【第20届上海大学程序设计联赛夏季赛】排列计数
LeetCode_ Double pointer_ Medium_ 61. rotating linked list
RT-Thread 组件 FinSH 使用时遇到的问题
随机推荐
反射及在运用过程中出现的IllegalAccessException异常
【翻译】供应链安全项目in-toto移至CNCF孵化器
Tensorflow2.0 self defined training method to solve function coefficients
JDBC详解
Leetcode 30. 串联所有单词的子串
[translation] linkerd's adoption rate in Europe and North America exceeded istio, with an increase of 118% in 2021.
【pytorch】yolov5 训练自己的数据集
[translation] micro survey of cloud native observation ability. Prometheus leads the trend, but there are still obstacles to understanding the health of the system
Interface test tool - postman
About image reading and processing, etc
系统性详解Redis操作Hash类型数据(带源码分析及测试结果)
Analysis of frequent chain breaks in applications using Druid connection pools
Mind map + source code + Notes + project, ByteDance + JD +360+ Netease interview question sorting
Solution of commercial supply chain management platform for packaging industry: layout smart supply system and digitally integrate the supply chain of packaging industry
【翻译】数字内幕。KubeCon + CloudNativeCon在2022年欧洲的选择过程
MySQL information schema learning (I) -- general table
[玩转Linux] [Docker] MySQL安装和配置
通俗的讲解,带你入门协程
MySQL information Schema Learning (i) - - General table
Mysql Information Schema 学习(二)--Innodb表