当前位置:网站首页>Retinaface: single stage dense face localization in the wild
Retinaface: single stage dense face localization in the wild
2022-07-03 10:01:00 【Star soul is not a dream】
Paper code :https://github.com/deepinsight/insightface/tree/master/RetinaFace.
Accurate and efficient face location in natural scenes is still a challenge . This paper presents a robust Of single-stage Face detector :RetinaFace, It USES Joint additional supervision ( contribution 1) And self supervised multi task learning ( contribution 2), Pixel face location on faces of different scales .
Five contributions :
- stay WIDER FACE Five facial signs are manually marked on the dataset , And observed that with the help of this additional monitoring signal , Face detection has been significantly improved .
- Added a self supervised grid coding Branch , Used to predict a pixel by pixel 3D Face information . This branch is in parallel with the existing supervisory branch .
- stay WIDER FACE Test set ,RetinaFace The average accuracy of (AP) Than the current average accuracy (AP) Higher than 1.1% (AP = 91.4%).
- stay IJB-C Test set ,RetinaFace Make the current best ArcFace In face authentication (face verification) Further improve (TAR=89.59 FAR=1e-6).
By using lightweight backbone ,RetinaFace Single core can CPU In real time VGA The resolution of the Image .
Automatic face location is face image analysis, such as face attributes ( expression , Age ,ID distinguish ) The prerequisite for . Face location in a narrow sense may refer to traditional face detection , It aims to estimate the face detection frame without any scale and location a priori . However , This paper refers to the generalized definition of face location , Including face detection 、 Face to face comparison ( face alignment)、 Pixelated face analysis and 3D Dense correspondence regression (3D dense correspondence regression). This dense facial positioning provides accurate facial position information for all different scales .

chart 1. The proposed one-stage pixel level face location method uses additional supervision (extra-supervised) And self supervised multi task learning , And existing box Classification and regression branches are parallel . Each positive anchor (positive anchor) The output is :(1) One face scores ;(2) A person's face frame ;(3) Five faces landmarks;(4) Dense objects projected on the image plane 3D Face vertex .
Usually , Face detection training process includes classification and box Return to loss .Chen Et al. Provided better feature observation for face classification based on aligned face shape , Proposed to face detection and alignment Combined in a joint cascading framework . suffer
[6] (D. Chen, S. Ren, Y. Wei, X. Cao, and J. Sun. Joint cascade face detection and alignment. In ECCV, 2014. 1, 2) Inspired by the ,MTCNN and STN Detect the face and five facial signs at the same time . Due to the limitation of training data ,JDA、MTCNN and STN Not verified yet tiny Can face detection benefit from additional supervision of five facial markers . In this paper , One of the questions we aim to answer is , Can we use additional surveillance signals constructed from five facial signs , To promote the current in WIDER FACE hard test set[60] The best performance in (90.3%[67]).
stay Mask R-CNN in , By adding the branches of the prediction object mask while recognizing and regressing the bounding box of the existing branches , The detection performance has been significantly improved . This confirms that dense pixel level annotation also helps to improve detection . Unfortunately , about WIDER FACE Challenging face , It is impossible to make intensive facial annotation ( In the form of more annotations or semantic fragments ). Due to the supervision signal, it is not easy to obtain , The question is whether we can apply unsupervised methods to further improve the face .
FAN Put forward a kind of anchor Level attention map to improve the detection of occluded faces . However , The proposed attention map is very rough , And does not contain semantic information . But recently , Self supervised 3D Morphological model has achieved good 3D face modeling in natural environment . In particular, the mesh decoder realizes real-time speed by using graph convolution in shape and texture . However , The challenges of applying grid decoder to single-stage architecture are : (1) Camera parameters are difficult to estimate accurately (2) The joint potential shape and texture are predicted in a single feature vector ( On the characteristic pyramid 1×1 Conv), instead of ROI Pooling characteristics , There is a risk of feature transfer . In this paper , We use a branch of grid decoder through self supervised learning , Used to predict pixel level in parallel with existing supervisory branches 3D Face shape . In general, our main contributions are as follows :
- On the basis of single-stage design , A new method named RetinaFace Pixel based face location method , This method adopts a multi task learning strategy to predict the face score at the same time 、 Face frame 、 The key points of five faces and the three-dimensional position and corresponding relationship of each face pixel .
- stay WIDER FACE hard On a subset ,RetinaFace It is higher than the current two-stage method 1.1%( Average accuracy reaches 91.4%)
- stay IJB-C On dataset ,RetinaFace Contribute to ArcFace Verification accuracy of (TAR =89.59% 、FAR=1e-6). This shows that better face location can significantly improve the ability of face recognition .
- By using lightweight backbone ,RetinaFace Can be in a single CPU Real time running on the core vga Resolution image .
- Additional comments and code have been released , To promote future research .
2. Related work
Image pyramid vs Characteristic pyramid :
The earliest sliding windows can be traced back to decades ago ( The classifier is applied to a dense image grid ).Viola-Jones The milestone work explores the cascade chain , It can remove false face regions from the image pyramid in real time and efficiently , This scale invariant face detection framework has been widely used . Although the sliding window on the image pyramid is the main detection paradigm , But with the emergence of the feature pyramid , Sliding anchor on multi-scale feature map Quickly occupied the dominant position of face detection .
Two stages vs . Single stage :
The current face detection method inherits some achievements of the general target detection method , It can be divided into two categories : Two stage approach ( Such as FAST R-CNN) And a one-stage approach ( Such as SSD and RetinaNet). The two-stage approach uses “ Suggestions and improvements (proposal and refinement)” Mechanism , It has high positioning accuracy . The single-stage method intensively samples the position and scale of human face , This leads to a great imbalance between positive samples and negative samples in the training process . To deal with this imbalance , Sampling and re-weighting Methods . Compared with the two-stage method , The single-stage method is more efficient , Higher recall rate , But there is a higher false positive rate 、 The risk of decreased positioning accuracy .
Context modeling :
To enhance model capture tiny The ability of contextual reasoning in face ,SSH and PyramidBox The context module is applied to the feature pyramid to expand the receptive field obtained in the Euclidean grid . To enhance CNNs The ability of non rigid transformation modeling , Deformable convolution network (DCN) A new deformable layer is used to model geometric transformation .2018 Year of WIDER Face Challenge The champion solution shows , Rigidity ( Expand ) And non rigid ( deformation ) Context modeling is complementary and orthogonal , It can improve the performance of face detection .
Multi task learning :
The combination of face detection and alignment is widely used , Because the aligned face shape provides better features for face classification . stay Mask R-CNN in , By adding a branch of the prediction object mask in parallel to the existing branch , Significantly improved detection performance .Densepose Adopted Mask-RCNN The architecture of , Get the dense part labels and coordinates in each selected area . Dense regression branches are trained through supervised learning . Besides , A dense branch is a small FCN Apply to each RoI, To predict pixel to pixel dense mapping .
3.RetinaFace
3.1 Multitasking loss
For every training anchor i , Minimize the multitask loss function :
(1)
(1) Face classification loss
, among
by anchor i Prediction probability for face , also
about positive anchor yes 1, about negative anchor yes 0. Classified loss
The second is classification ( face / Not the face ) Of softmax loss.
(2) Face frame regression loss
, among
and
Indicates prediction box and And positive anchor The coordinates of the relevant real box . According to the literature 【16】 Standardized box regression goal ( That is, the center position 、 Width and height ) And use
, here R yes The literature 【16】 Defined smooth-L1 Loss function .
(3) Face key point regression loss
, among
and
respectively Predicted five face key points and And right anchor About the true value . And box centre Return to similar , Five face key point regression is also based on anchor The target normalization method of the center .
(4) Dense regression loss
( Refer to the formula 3).
Loss balance parameters
Set to 0.25、0.1 and 0.01, It means that we have added signals from supervision The importance of better border and key positioning .
[16] R. Girshick. Fast r-cnn. In ICCV, 2015. 1, 3
To be continued :
边栏推荐
- There is no specific definition of embedded system
- 4G module IMEI of charging pile design
- Project cost management__ Topic of comprehensive calculation
- JS foundation - prototype prototype chain and macro task / micro task / event mechanism
- Windows下MySQL的安装和删除
- Design of charging pile mqtt transplantation based on 4G EC20 module
- It is difficult to quantify the extent to which a single-chip computer can find a job
- I didn't think so much when I was in the field of single chip microcomputer. I just wanted to earn money to support myself first
- 01 business structure of imitation station B project
- 对于新入行的同学,如果你完全没有接触单片机,建议51单片机入门
猜你喜欢

yocto 技术分享第四期:自定义增加软件包支持

Oracle数据库 SQL语句执行计划、语句跟踪与优化实例

Code word in NR

Open Euler Kernel Technology Sharing - Issue 1 - kdump Basic Principles, use and Case Introduction

2. Elment UI date selector formatting problem

手机都算是单片机的一种,只不过它用的硬件不是51的芯片

Which language should I choose to program for single chip microcomputer

Getting started with JMX, MBean, mxbean, mbeanserver

yocto 技術分享第四期:自定義增加軟件包支持

Development of intelligent charging pile (I): overview of the overall design of the system
随机推荐
51 MCU tmod and timer configuration
All processes of top ten management in project management
Fundamentals of Electronic Technology (III)__ Chapter 1 resistance of parallel circuit
STM32 external interrupt experiment
Pymssql controls SQL for Chinese queries
Oracle database SQL statement execution plan, statement tracking and optimization instance
2021-10-28
Runtime. getRuntime(). GC () and runtime getRuntime(). The difference between runfinalization()
ADS simulation design of class AB RF power amplifier
嵌入式系统没有特别明确的定义
MySQL的简单使用(增删改查)
A lottery like scissors, stone and cloth (C language)
You need to use MySQL in the opening experiment. How can you forget the basic select statement? Remedy is coming~
It is difficult to quantify the extent to which a single-chip computer can find a job
The third paper of information system project manager in soft examination
Vscode markdown export PDF error
对于新入行的同学,如果你完全没有接触单片机,建议51单片机入门
01 business structure of imitation station B project
Idea remote breakpoint debugging jar package project
Drive and control program of Dianchuan charging board for charging pile design