当前位置:网站首页>Retinaface: single stage dense face localization in the wild
Retinaface: single stage dense face localization in the wild
2022-07-03 10:01:00 【Star soul is not a dream】
Paper code :https://github.com/deepinsight/insightface/tree/master/RetinaFace.
Accurate and efficient face location in natural scenes is still a challenge . This paper presents a robust Of single-stage Face detector :RetinaFace, It USES Joint additional supervision ( contribution 1) And self supervised multi task learning ( contribution 2), Pixel face location on faces of different scales .
Five contributions :
- stay WIDER FACE Five facial signs are manually marked on the dataset , And observed that with the help of this additional monitoring signal , Face detection has been significantly improved .
- Added a self supervised grid coding Branch , Used to predict a pixel by pixel 3D Face information . This branch is in parallel with the existing supervisory branch .
- stay WIDER FACE Test set ,RetinaFace The average accuracy of (AP) Than the current average accuracy (AP) Higher than 1.1% (AP = 91.4%).
- stay IJB-C Test set ,RetinaFace Make the current best ArcFace In face authentication (face verification) Further improve (TAR=89.59 FAR=1e-6).
By using lightweight backbone ,RetinaFace Single core can CPU In real time VGA The resolution of the Image .
Automatic face location is face image analysis, such as face attributes ( expression , Age ,ID distinguish ) The prerequisite for . Face location in a narrow sense may refer to traditional face detection , It aims to estimate the face detection frame without any scale and location a priori . However , This paper refers to the generalized definition of face location , Including face detection 、 Face to face comparison ( face alignment)、 Pixelated face analysis and 3D Dense correspondence regression (3D dense correspondence regression). This dense facial positioning provides accurate facial position information for all different scales .

chart 1. The proposed one-stage pixel level face location method uses additional supervision (extra-supervised) And self supervised multi task learning , And existing box Classification and regression branches are parallel . Each positive anchor (positive anchor) The output is :(1) One face scores ;(2) A person's face frame ;(3) Five faces landmarks;(4) Dense objects projected on the image plane 3D Face vertex .
Usually , Face detection training process includes classification and box Return to loss .Chen Et al. Provided better feature observation for face classification based on aligned face shape , Proposed to face detection and alignment Combined in a joint cascading framework . suffer
[6] (D. Chen, S. Ren, Y. Wei, X. Cao, and J. Sun. Joint cascade face detection and alignment. In ECCV, 2014. 1, 2) Inspired by the ,MTCNN and STN Detect the face and five facial signs at the same time . Due to the limitation of training data ,JDA、MTCNN and STN Not verified yet tiny Can face detection benefit from additional supervision of five facial markers . In this paper , One of the questions we aim to answer is , Can we use additional surveillance signals constructed from five facial signs , To promote the current in WIDER FACE hard test set[60] The best performance in (90.3%[67]).
stay Mask R-CNN in , By adding the branches of the prediction object mask while recognizing and regressing the bounding box of the existing branches , The detection performance has been significantly improved . This confirms that dense pixel level annotation also helps to improve detection . Unfortunately , about WIDER FACE Challenging face , It is impossible to make intensive facial annotation ( In the form of more annotations or semantic fragments ). Due to the supervision signal, it is not easy to obtain , The question is whether we can apply unsupervised methods to further improve the face .
FAN Put forward a kind of anchor Level attention map to improve the detection of occluded faces . However , The proposed attention map is very rough , And does not contain semantic information . But recently , Self supervised 3D Morphological model has achieved good 3D face modeling in natural environment . In particular, the mesh decoder realizes real-time speed by using graph convolution in shape and texture . However , The challenges of applying grid decoder to single-stage architecture are : (1) Camera parameters are difficult to estimate accurately (2) The joint potential shape and texture are predicted in a single feature vector ( On the characteristic pyramid 1×1 Conv), instead of ROI Pooling characteristics , There is a risk of feature transfer . In this paper , We use a branch of grid decoder through self supervised learning , Used to predict pixel level in parallel with existing supervisory branches 3D Face shape . In general, our main contributions are as follows :
- On the basis of single-stage design , A new method named RetinaFace Pixel based face location method , This method adopts a multi task learning strategy to predict the face score at the same time 、 Face frame 、 The key points of five faces and the three-dimensional position and corresponding relationship of each face pixel .
- stay WIDER FACE hard On a subset ,RetinaFace It is higher than the current two-stage method 1.1%( Average accuracy reaches 91.4%)
- stay IJB-C On dataset ,RetinaFace Contribute to ArcFace Verification accuracy of (TAR =89.59% 、FAR=1e-6). This shows that better face location can significantly improve the ability of face recognition .
- By using lightweight backbone ,RetinaFace Can be in a single CPU Real time running on the core vga Resolution image .
- Additional comments and code have been released , To promote future research .
2. Related work
Image pyramid vs Characteristic pyramid :
The earliest sliding windows can be traced back to decades ago ( The classifier is applied to a dense image grid ).Viola-Jones The milestone work explores the cascade chain , It can remove false face regions from the image pyramid in real time and efficiently , This scale invariant face detection framework has been widely used . Although the sliding window on the image pyramid is the main detection paradigm , But with the emergence of the feature pyramid , Sliding anchor on multi-scale feature map Quickly occupied the dominant position of face detection .
Two stages vs . Single stage :
The current face detection method inherits some achievements of the general target detection method , It can be divided into two categories : Two stage approach ( Such as FAST R-CNN) And a one-stage approach ( Such as SSD and RetinaNet). The two-stage approach uses “ Suggestions and improvements (proposal and refinement)” Mechanism , It has high positioning accuracy . The single-stage method intensively samples the position and scale of human face , This leads to a great imbalance between positive samples and negative samples in the training process . To deal with this imbalance , Sampling and re-weighting Methods . Compared with the two-stage method , The single-stage method is more efficient , Higher recall rate , But there is a higher false positive rate 、 The risk of decreased positioning accuracy .
Context modeling :
To enhance model capture tiny The ability of contextual reasoning in face ,SSH and PyramidBox The context module is applied to the feature pyramid to expand the receptive field obtained in the Euclidean grid . To enhance CNNs The ability of non rigid transformation modeling , Deformable convolution network (DCN) A new deformable layer is used to model geometric transformation .2018 Year of WIDER Face Challenge The champion solution shows , Rigidity ( Expand ) And non rigid ( deformation ) Context modeling is complementary and orthogonal , It can improve the performance of face detection .
Multi task learning :
The combination of face detection and alignment is widely used , Because the aligned face shape provides better features for face classification . stay Mask R-CNN in , By adding a branch of the prediction object mask in parallel to the existing branch , Significantly improved detection performance .Densepose Adopted Mask-RCNN The architecture of , Get the dense part labels and coordinates in each selected area . Dense regression branches are trained through supervised learning . Besides , A dense branch is a small FCN Apply to each RoI, To predict pixel to pixel dense mapping .
3.RetinaFace
3.1 Multitasking loss
For every training anchor i , Minimize the multitask loss function :
(1)
(1) Face classification loss
, among
by anchor i Prediction probability for face , also
about positive anchor yes 1, about negative anchor yes 0. Classified loss
The second is classification ( face / Not the face ) Of softmax loss.
(2) Face frame regression loss
, among
and
Indicates prediction box and And positive anchor The coordinates of the relevant real box . According to the literature 【16】 Standardized box regression goal ( That is, the center position 、 Width and height ) And use
, here R yes The literature 【16】 Defined smooth-L1 Loss function .
(3) Face key point regression loss
, among
and
respectively Predicted five face key points and And right anchor About the true value . And box centre Return to similar , Five face key point regression is also based on anchor The target normalization method of the center .
(4) Dense regression loss
( Refer to the formula 3).
Loss balance parameters
Set to 0.25、0.1 and 0.01, It means that we have added signals from supervision The importance of better border and key positioning .
[16] R. Girshick. Fast r-cnn. In ICCV, 2015. 1, 3
To be continued :
边栏推荐
- STM32 port multiplexing and remapping
- JS基础-原型原型链和宏任务/微任务/事件机制
- Hal library sets STM32 clock
- In third tier cities and counties, it is difficult to get 10K after graduation
- Project scope management__ Scope management plan and scope specification
- It is difficult to quantify the extent to which a single-chip computer can find a job
- Windows下MySQL的安装和删除
- 要选择那种语言为单片机编写程序呢
- 2. Elment UI date selector formatting problem
- PIP references domestic sources
猜你喜欢
![[untitled] proteus simulation of traffic lights based on 89C51 Single Chip Microcomputer](/img/90/4de927e797ec9c2bb70e507392bed0.jpg)
[untitled] proteus simulation of traffic lights based on 89C51 Single Chip Microcomputer

Fundamentals of Electronic Technology (III)__ Fundamentals of circuit analysis__ Basic amplifier operating principle

My notes on the development of intelligent charging pile (III): overview of the overall design of the system software

Programming ideas are more important than anything, not more than who can use several functions, but more than the understanding of the program

Quelle langue choisir pour programmer un micro - ordinateur à puce unique

Windows下MySQL的安装和删除

Interruption system of 51 single chip microcomputer

Code word in NR

手机都算是单片机的一种,只不过它用的硬件不是51的芯片

STM32 external interrupt experiment
随机推荐
QT qcombobox QSS style settings
Uniapp realizes global sharing of wechat applet and custom sharing button style
STM32 running lantern experiment - library function version
学习开发没有捷径,也几乎不存在带路会学的快一些的情况
Code word in NR
一个可执行的二进制文件包含的不仅仅是机器指令
There is no shortcut to learning and development, and there is almost no situation that you can learn faster by leading the way
Raspberry pie installation SciPy
Open Euler Kernel Technology Sharing - Issue 1 - kdump Basic Principles, use and Case Introduction
ADS simulation design of class AB RF power amplifier
Idea remote breakpoint debugging jar package project
新系列单片机还延续了STM32产品家族的低电压和节能两大优势
UCI and data multiplexing are transmitted on Pusch (Part VI) -- LDPC coding
Notes on C language learning of migrant workers majoring in electronic information engineering
Mysql database underlying foundation column
4G module IMEI of charging pile design
An executable binary file contains more than machine instructions
STM32 interrupt switch
Interruption system of 51 single chip microcomputer
(1) 什么是Lambda表达式