当前位置:网站首页>【Cascade FPD】《Deep Convolutional Network Cascade for Facial Point Detection》
【Cascade FPD】《Deep Convolutional Network Cascade for Facial Point Detection》
2022-07-02 07:44:00 【bryant_ meng】
CVPR-2013
List of articles
1 Background and Motivation
face keypoint detection advantageous to face recognition and analysis
face keypoint detection The difficulty lies in extreme poses, lightings, expressions, and occlusions Scene
Existing methods :
- classifying(component detector) search windows, want scanning, Using local features
- directly predicting keypoint positions (or shape parameters)
The author designed a cascade CNN structure ——a cascaded regression approach for facial point detection with three levels of convolutional networks,significantly improves the prediction accuracy of SOTA and latest commercial software
2 Related Work
- Many used Adaboost, SVM, or random forest classifiers as component detectors and detection was based on local image features.
- regression-based approaches
- Convolutional networks
3 Advantages / Contributions
- Put forward cascade Of CNN Structure is used to accurately locate the key points of the face , The effect on some data is better than SOTA And commercial software
- use locally sharing weights Carry out more targeted training on different key points of the face
4 Method
The cascade network structure is as follows
cascade three levels of convolutional networks to make coarse-to-fine prediction
Five key points :
- left eye center (LE)
- right eye center (RE)
- nose tip (N)
- left mouth corner (LM)
- right mouth corner (RM)
1)level 1
The input is the whole face , The three networks predict
- whole face (F)—— It refers to the five key points on the face
- eyes and nose (EN)
- nose and mouth (NM)
The results of the three networks will be averaged as a follow-up level Part of the input
2)level2 and level3
The input is the previous level Predict the coordinates of the key points of the face as a benchmark patch
level2 and level3 Yes 10 A network , Predict separately 5 Horizontal and vertical coordinates of key points
Predictions at the last two levels are strictly restricted because local appearance is sometimes ambiguous and unreliable.
3) Final forecast
Also in level1 Based on the predicted results refine( Δ \Delta Δ)
4) Specific network structure
level1 Three networks ,level2 and level3 Each has 10 A network , What does it look like ?
Have a look first level1 Of F1
Look at other structures
level1 Yes S0 and S1,level2 and level3 They all use S2
5)locally sharing weights
globally sharing weights does not work well on images with fixed spatial layout, such as faces
For example, while eyes and mouth may share low-level features (e.g. edges), they are very different at high-level.
Let's first look at the formula of convolution
Abbreviation C ( s , n , p , q ) C(s, n, p, q) C(s,n,p,q)
C R ( s , n , p , q ) CR(s, n, p, q) CR(s,n,p,q) It means in tanh Then an absolute value is added
except w w w and b b b More on u u u and v v v Outer and normal convolution ( No, locally shared weight) It's the same
Input feature map ( h , w , m ) (h, w, m) (h,w,m)
- m m m Enter the number of channels
- n n n Number of output channels , t t t The number of output channels , t = 0 , . . . , n − 1 t = 0,...,n-1 t=0,...,n−1
- s s s Yes kernel size
- i , j i, j i,j Is the spatial location index ( Not pixel space , It is the local shared space divided by the author , The specific division rules are shown in the following formula )
i = Δ h ⋅ u + 0 , . . . , Δ h ⋅ u + Δ h − 1 i = \Delta h \cdot u + 0, ... , \Delta h \cdot u + \Delta h -1 i=Δh⋅u+0,...,Δh⋅u+Δh−1, among Δ h = h − s + 1 p \Delta h = \frac{h-s+1}{p} Δh=ph−s+1, u = 0 , . . . , p − 1 u = 0, ... , p-1 u=0,...,p−1
j = Δ w ⋅ v + 0 , . . . , Δ w ⋅ v + Δ w − 1 j = \Delta w \cdot v + 0, ... , \Delta w \cdot v + \Delta w -1 j=Δw⋅v+0,...,Δw⋅v+Δw−1, among Δ w = w − s + 1 q \Delta w = \frac{w-s+1}{q} Δw=qw−s+1, v = 0 , . . . , q − 1 v = 0, ... , q-1 v=0,...,q−1
Put the whole picture ( h , w ) (h, w) (h,w) Roughly divided into p p p x q q q area ( use u u u and v v v To index ), The size of each area is approximately Δ h \Delta h Δh x Δ w \Delta w Δw, Weight sharing in each area , Not the whole picture ( Normal convolution weight sharing in the whole graph ——kernel size Of course, it is not shared in the scope )
Let's look at the formula of pool layer
gain coefficient g g g and shifted by a bias b b b, s s s is the side length of square pooling regions
FC layer
- n n n Output vector dimension , m m m The dimension of the input vector
- j = 0 , . . . , n − 1 j = 0, . . . , n − 1 j=0,...,n−1
tanh function
6) Specific input size
You can see F1 Our network is also expanded on the basis of human faces
level2 and level3 stay level1 Output point position Expand up and out
5 Experiments
5.1 Datasets
13, 466 face images,5, 590 images are from LFW + 7, 876 from the web
BioID has 1, 521 images of 23 subjects
LFPW contains 1, 432 face images from the web
The evaluation index
- ( x , y ) (x,y) (x,y) Is the key point of prediction
- ( x ′ , y ′ ) ({x}',{y}') (x′,y′) yes GT
- l l l is the width of the bounding box returned by our face detector
The error is greater than %5 Think failure
l l l by bi-ocular distance( Binocular distance ) More common ,but it has problem on faces with large pose variations, since bi-ocular distance of near-profile faces is much shorter than that of frontal faces, That is, it will magnify the error of the side face , The above will be relatively better
5.2 Investigate network and cascade structures
1)Network structure
F1 Explore the effects of different networks ,S0 good
the performance can be significantly improved by including more layers
S6 and S7 The structure is the same as S0, but S6 Convolution C No CR,S7 It's using globally shares weights instead of locally sharing weights
We also find that locally sharing weights in higher layers is more important
2)Multi-level prediction
cascade Come down ,error It's reducing
5.3 Comparison with other methods
6 Conclusion(own) / Future work
Code :https://github.com/luoyetx/deep-landmark
Recommended blog :
- Deep Convolutional Network Cascade for Facial Point Detection practice
- Deep Convolutional Network Cascade for Facial Point Detection Reading notes
cascade
locally sharing weights
边栏推荐
- 论文写作tip2
- 图片数据爬取工具Image-Downloader的安装和使用
- Alpha Beta Pruning in Adversarial Search
- 【MEDICAL】Attend to Medical Ontologies: Content Selection for Clinical Abstractive Summarization
- Typeerror in allenlp: object of type tensor is not JSON serializable error
- 传统目标检测笔记1__ Viola Jones
- Implementation of purchase, sales and inventory system with ssm+mysql
- Implement interface Iterable & lt; T>
- 【TCDCN】《Facial landmark detection by deep multi-task learning》
- Machine learning theory learning: perceptron
猜你喜欢
Timeout docking video generation
【Cascade FPD】《Deep Convolutional Network Cascade for Facial Point Detection》
Classloader and parental delegation mechanism
TimeCLR: A self-supervised contrastive learning framework for univariate time series representation
SSM personnel management system
基于onnxruntime的YOLOv5单张图片检测实现
Tencent machine test questions
SSM supermarket order management system
[introduction to information retrieval] Chapter II vocabulary dictionary and inverted record table
Implementation of purchase, sales and inventory system with ssm+mysql
随机推荐
[model distillation] tinybert: distilling Bert for natural language understanding
@Transitional step pit
Latex formula normal and italic
Yolov3 trains its own data set (mmdetection)
PointNet原理证明与理解
Apple added the first iPad with lightning interface to the list of retro products
Determine whether the version number is continuous in PHP
PHP uses the method of collecting to insert a value into the specified position in the array
Implementation of purchase, sales and inventory system with ssm+mysql
PPT的技巧
Proof and understanding of pointnet principle
Drawing mechanism of view (II)
使用百度网盘上传数据到服务器上
Thesis writing tip2
Timeout docking video generation
【TCDCN】《Facial landmark detection by deep multi-task learning》
SSM laboratory equipment management
Installation and use of image data crawling tool Image Downloader
Ppt skills
Typeerror in allenlp: object of type tensor is not JSON serializable error