当前位置：网站首页>【TCDCN】《Facial landmark detection by deep multi-task learning》

【TCDCN】《Facial landmark detection by deep multi-task learning》

2022-07-02 07:45:00 【bryant_ meng】

Insert picture description here

ECCV-2014

List of articles

1 Background and Motivation
2 Related Work
3 Advantages / Contributions
4 Method
5 Experiments
- 5.1 Datasets and Metrics
- 5.2 Experiments
6 Conclusion（own） / Future work

1 Background and Motivation

Face key point detection is the basic component of many face analysis tasks , For example, face attributes 、 Face authentication 、 Face recognition, etc

At present, face key point detection processing partial occlusion and large head pose variations The effect needs to be improved

The author noticed that the location of the key points of the face has a certain correlation with the face attributes , for example

when a kid is smiling, his mouth is widely opened（ Second column ）

the inter-ocular distance is smaller in faces with large yaw rotation（ The last column ）

Insert picture description here

Joint face attributes （head pose estimation, gender classfication, age estimation, facial expression recognition etc. ） Optimize with face key points , Can we improve the performance of key points ？

The author begins his narration

2 Related Work

Facial landmark detection
Landmark detection by CNN
Multi-task learning

3 Advantages / Contributions

Put forward TCDCN The Internet , Multi task learning , Use face attributes to assist in optimizing face key point detection （ Cooperate during training early stopping）

4 Method

1） Network structure

Multitasking includes ： Face key point detection （ Lord ）+ pose + gender + wear glasses + smiling

The network structure is as follows ,4 Conv + 1 Fc
Insert picture description here
The activation function uses absolute tangent function（ It should mean hyperbolic tangent , Ha ha ha ）

Insert picture description here
Different samples , Same attribute , Characteristic similarity , It reflects the multi task learning shared features It has certain generalization

2）Problem Formulation

Let's first look at the formula of multi task learning （ The regular term is omitted ）
Insert picture description here

$r$ Main task , That is, face key point detection task
$\alpha$ Auxiliary task , That is, face attribute classification task
$N$ Is the number of samples
$\lambda$ Is the weighting factor

complete object function by
Insert picture description here
The key point detection adopts least square loss, Attribute classification adopts cross entropy loss

3）Task-wise early stopping

different tasks have different loss functions and learning diffculties, and thus with different convergence rates（ Different people have different destinations , The drop off point is also different ）

The author is in the process of multi task learning , Adopt early stop mechanism , Get off in batches

The judgment basis for stopping is

Insert picture description here

$t$ Indicates the current number of iterations
$k$ Indicates the iteration number span of the statistics of the early stop mechanism
$E_{val}$ , Verify the loss
$E_{tr}$ , From the training set loss
$m e d$ ,median value
$\lambda_{\alpha}$ Indicates auxiliary task $\alpha$ The weighting coefficient of ,learnable
The first item means the tendency of the training error, If k During the next iteration ,loss It's falling fast , This item is relatively small
The second means the generalization error

5 Experiments

5.1 Datasets and Metrics

Data sets

AFLW
AFW

The evaluation index

mean error： The normalized unit is inter-ocular distance
failure rate：Mean error larger than 10% is reported as a failure

5.2 Experiments

1）Evaluating the Effectiveness of Learning with Related Task
Insert picture description here
FLD+pose performs the best

Let's look at a more detailed analysis FLD+smile
Insert picture description here
chart 5（a） It can be seen that FLD（Face Landmark Detection） and smile Attribute joint training ,nose and mouth Of mean error There is a significant decrease , This is easier to understand （ smile involving Zygomaticus and levator labii superioris muscles）

Insert picture description here
Picture from the Internet , Invasion and deletion ！！！

chart 5（b） It's a bit of a rollover , ha-ha , Yes? smile and right eye The correlation is so high , but （a） The promotion in is not particularly obvious

Maybe this correlation coefficient is only a rough estimate , Ha ha ha （We use a crude method to investigate the relationship between tasks.）

The author used Pearson’s correlation To calculate the Correlation

Pearson’s correlation of the learned weight vectors of the last fully-connected layer, between the tasks of facial landmark detection and `smiling’ prediction

How to understand Pearson correlation coefficient （Pearson Correlation Coefficient）？ - TimXP Answer - You know
https://www.zhihu.com/question/19734616/answer/117730676

Let's look at the union pose The promotion of
Insert picture description here
DDDD,pose Yes FLD Maximum impact , reasonable

2）The Bene ts of Task-wise Early Stopping

Insert picture description here
pose Stop at the latest

early stopping, bring FLD Convergence is faster , A more stable

3）Comparison with the Cascaded CNN

Start solo 【Cascade FPD】《Deep Convolutional Network Cascade for Facial Point Detection》了 , ha-ha

precision

Insert picture description here
Mouth almost meaning ,cascaded CNN in left mouth mean error There is also a considerable part concentrated in 10% within

Speed

120ms vs 17ms on an Intel Core i5 CPU

7x faster

4）Comparison with other State-of-the-art Methods
Insert picture description here

Show

5）TCDCN for Robust Initialization
Insert picture description here
Instead of drawing training samples randomly as initialization, this , Come up and let you do it ……