当前位置:网站首页>Augfpn: improved multiscale feature learning for target detection
Augfpn: improved multiscale feature learning for target detection
2022-06-29 09:10:00 【TJMtaotao】
Chaoxu Guo1, Bin Fan1, Qian Zhang2, Shiming Xiang1, and Chunhong Pan1
1NLPR,CASIA
2Horizon Robotics
1{chaoxu.guo, bfan, smxiang, chpan}@nlpr.ia.ac.cn
[email protected]
The work of this paper is based on Faster R-CNN Improvement , The main contribution is to improve FPN, Put forward AugFPN Multiscale feature learning module
Abstract
At present, the most advanced detectors usually use Characteristic pyramid To detect targets of different scales . among ,FPN It is one of the representative works of multi-scale feature summation to construct feature pyramid . However , The design defects behind it hinder the full use of multi-scale features . This paper first analyzes FPN Design defect of feature pyramid in , Then a new feature pyramid structure is proposed AugFPN(AugFPN). say concretely ,AugFPN It consists of three parts : Consistency monitoring 、 The remaining features are enhanced and soft RoI choice .AugFPN adopt Consistency monitoring , It reduces the semantic gap between features of different scales before feature fusion . In feature fusion , The residual feature is used to enhance the extraction of context information with constant ratio , To reduce the information loss of feature mapping at the highest pyramid level . Last , Use soft RoI choice , Adaptively learn better after feature fusion RoI features . stay Faster R-CNN in , use AugFPN Instead of FPN, Respectively by ResNet50 and MobileNet-v2 As a backbone network , The average accuracy of the model is improved 2.3 and 1.6 A little bit . Besides , When using ResNet50 As the backbone ,AugFPN take RetinaNet Improved 1.6 individual AP,FCOS Improved 0.9 individual AP. Code will be provided .

chart 1. Three design flaws of feature pyramid network :1) Semantic differences between different levels of features before feature summation ;2) The information of the highest pyramid level feature is lost ;3) heuristic RoI Distribute .
1. Introduction
With the deep convolution network (ConvNets) The development of , Significant progress has been made in image target detection . Some detectors have been proposed [10、33、9、25、30、12、21、22], To steadily advance the most advanced technology . Among these detectors ,FPN[21] Is a simple and effective two-level object detection framework . To be specific ,FPN Is in ConvNet On the level of inherent characteristics , By propagating semantically strong features from high level to low level , Build a feature pyramid .
By improving the multi-scale feature with strong semantics , The performance of target detection is greatly improved . However ,FPN There are some design flaws in the feature pyramid in , Pictured 1 Shown . Basically ,FPN The feature pyramid in can be divided into three stages :(1) Before feature fusion ,(2) Top down feature fusion , as well as (3) After feature fusion . We found that every stage has an inherent defect , As follows :
Semantic differences between different levels of features . Before feature fusion , Features of different levels pass through independently 1×1 Convolution layer to reduce characteristic channel , The huge semantic gap between these features is not considered . Due to inconsistent semantic information , Directly fusing these features will reduce the ability of multi-scale feature representation .
The information of the highest level feature graph is lost . In feature fusion , Features propagate in a top-down fashion , Low level features can be improved by using the strong semantic information of high-level features . However , The features at the highest pyramid level lose information due to channel reduction . By combining the global context features extracted from the global pool [29], Reduce information loss . however , Because there may be multiple targets in one image , This strategy of merging feature maps into a single vector may lose spatial relationships and details .
RoIs Heuristic allocation strategy . After feature fusion , Each target scheme is refined based on a feature grid at a feature level , And heuristic selection is made according to the scale of the scheme . However , Neglected features of other levels may be beneficial to object classification or regression . With this in mind ,PANet[24] A collection of all pyramid level roi characteristic , And after adapting it to an independent fully connected layer , With the max Integration of operation . For all that ,max-fusion Features that are less responsive are ignored , These features may also help , But you still can't take full advantage of other levels of features . meanwhile , The additional full connectivity layer significantly increases the model parameters .
This paper presents a simple and effective feature pyramid AugFPN, It integrates three different components to deal with the above problems respectively . First , Propose a consistency monitoring mechanism , By implementing the same supervision signal on these feature maps , Make the feature mapping after horizontal connection contain similar semantic information . secondly , Using ratio invariant adaptive pool to extract different context information , Reduce the information loss of the highest level features in the feature pyramid by residual . We name this process residual feature enhancement . Third , Introduce soft RoI choice , Make better use of the different pyramid levels RoI features , It provides better information for the subsequent location refinement and classification RoI features .
Without bells and whistles , When using ResNet50 and ResNet101 As the backbone , be based on AugFPN Of Faster R-CNN The average accuracy of (AP) Based on FPN Fast 2.3 and 1.7. Besides , When the backbone network is changed to MobileNet-V2 when ,AugFPN The overall performance of 1.6ap,MobileNet-V2 Is a lightweight and efficient network .Augfpn It can also be extended to the primary detector , Just a little modification . use AugFPN Instead of FPN, Can make RetinaNet and FCOS Expected to be improved respectively 1.6 and 0.9 times , Thus it is verified that AugFPN The generality of .
Our contributions are summarized as follows :
• We revealed FPN Three different stages of the problem , These problems hinder the full use of multiscale features .
• A new feature pyramid network is proposed AugFPN, Monitor with consistency respectively 、 Residual feature enhancement and soft RoI Choose to solve these problems .
• We evaluated the situation in MS COCO Equipped with various detectors and trunks AugFPN, And based on FPN Compared with the detector of , It always brings about significant improvements .
2. Related Work
Deep target detector . Modern target detection methods almost follow two patterns , Two stages and one stage . As a two-stage detection method [10,9,33,4,21,1,35,19,20,28] Our pioneering work ,R-CNN[10] First use selective search [37] Generate regional recommendations , Then we extract regional features through convolution network to refine these suggestions . In order to improve the speed of training and reasoning ,SPP[13] and Fast R-CNN[9] Firstly, feature mapping of the whole image is extracted , Then use the spatial pyramid pool and RoI The pool generates regional features . Last , Use regional rituals to refine the proposal .Faster R-CNN[33] A regional suggestion network is proposed , An end-to-end trainable detector is developed , Significantly improved performance , Speed up reasoning . In order to pursue the scale invariance of target detection ,FPN[21] Based on the inherent feature hierarchy of convolution network, an in network feature pyramid is constructed , And according to the scale suggested by the region, the prediction is made at different pyramid levels .RoI Align[12] By solving RoI The quantification of the pool , There are great improvements in object detection and instance segmentation . Deformable network [5,42] By modeling the geometric structure of the target , The performance of target detection is significantly improved .Cascade R-CNN[1] Introduce multi-level refinement to faster R-CNN in , So as to achieve more accurate prediction of the target position .
Contrary to the two-stage detector , Primary detector [25、30、6、31、22、17、23、32、39、41] More efficient , But less accurate .SSD[25] Anchor boxes are densely placed on multi-scale features , And make predictions based on these anchors .RetinaNet[22] Using something similar to FPN The characteristic pyramid serves as the backbone , And introduces a new focal loss To solve the imbalance between simple and difficult examples .ExtremeNet〔41〕 The problem of target detection is modeled as the problem of detecting targets 4 Extreme points . These efforts have made significant progress from different perspectives . This paper studies how to better develop multi-scale features .
In depth supervision . In depth Supervision [15,18,40,7] It is a feature representation to solve the problem of gradient disappearance or enhancement of the middle layer .Huang wait forsomeone .[15] By training at different levels at the same time , Integrating multiple classifiers with different resource requirements into a single deep network .PSPNet[40] Additional pixel level loss is introduced in the middle layer , To reduce the difficulty of optimization . lately Nas-FPN[7] After all the intermediate pyramid networks, a classifier and a regression header are added , The purpose is to detect at any time . Contrary to these works , We apply the instance level supervision signal to all pyramid level features after horizontal connection , The purpose is to narrow the semantic gap between them , Make the feature more suitable for the subsequent feature summation .
Context utilization . There are several ways to prove the context in target detection [8,29,38] And segmentation [16,26,40] Importance of .Deeplab-v2[3] A multi-scale context extraction method based on atoros convolution is proposed ,PSPNet[40] Use pyramid pool to get hierarchical global context , Both of these methods greatly improve the quality of semantic segmentation . The difference is , We use Adaptive pool with constant ratio To generate different spatial context information , And use them to reduce the information loss of features in the channel of the highest pyramid level in a residual way .
Investment return allocation strategy .FPN[21] Assemble from a specific pyramid level RoI features , according to RoI Choice of scale . However , Under this strategy , Two schemes with similar scale can be assigned to different feature levels , This may produce suboptimal results . To solve this problem ,PANet Will come from all pyramid levels RoI Features come together , And pass max The operation adjusts them independently of the fully connected layer and fuses them .PANet It is obviously different from our work , We propose an adaptive weight generation method based on data , According to the weight, the features are absorbed from all levels . This makes better use of different levels of functionality . Besides , Our work requires fewer parameters , Because no additional fully connected layer is required to accommodate RoI characteristic .

chart 2. be based on AugFPN The overall pipeline of the detector .(1) -(3) yes AugFPN The three main components of : Consistency monitoring 、 Residual feature enhancement and soft RoI choice . For the sake of simplicity , Do not display the sum of features 3×3 Convolution layer .
3 methodology
AugFPN The overall framework of is shown in the figure 2 Shown . stay FPN[21] After , The features used to build the feature pyramid are represented as {C2、C3、C4、C5}, They correspond to feature levels w.r.t. With span in {4、8、16、32} Feature mapping of pixels .{M2、M3、M4、M5} It is the feature that the characteristic channel is reduced after horizontal connection .{P2,P3,P4,P5} Is the feature generated by the feature pyramid .AugFPN The three components of are discussed in the following sections .
3.1 Consistent supervision
FPN The feature pyramid is constructed by using the feature hierarchy that generates feature maps with different resolutions in the network . In order to integrate multi-scale context information ,FPN Upsampling and summing through a top-down path , Merging features of different scales . However , Features at different scales contain information at different levels of abstraction , There is a big semantic gap between them . although FPN The method used is simple and effective , However, the fusion of multiple features with large semantic gaps will lead to a suboptimal feature pyramid . This motivates us to propose consistent oversight , The same monitoring signal is applied to the multi-scale features before fusion , The aim is to narrow the semantic gap between them . say concretely , We start with multiscale features from the backbone {C2,C3,C4,C5} Build a feature pyramid . then , Regional suggestion network (RPN) Attached to the resulting feature pyramid {P2、P3、P4、P5} To generate a large number of roi. For consistent monitoring , Every RoI Mapped to all feature levels ,RoI Align[12] To obtain the {M2、M3、M4、M5} Of each level of RoI features . after , Multiple classification and box regression heads are connected to these features , To generate ancillary losses . These classification and regression header parameters are shared at different levels , In addition to the same supervisory signals , It can further force different feature maps to learn similar semantic information . For more stable optimization , Use the right weight to balance the auxiliary loss caused by the consistency supervision and the original loss . Formally ,rcnn The final loss function formula of water is as follows :



边栏推荐
猜你喜欢
随机推荐
2022年7月产品经理认证招生简章(NPDP)
Summary of IO streams
闭关修炼(二十二)session和cookie原理
51 MCU interrupt and timer counter, based on Puzhong technology hc6800-esv2.0
Heavyweight released "FISCO bcos application landing guide"
2022年7月(软考高级)信息系统项目管理师认证招生简章
一般乘法器设计,verilog code
July 2022 (advanced soft test) information system project manager certification enrollment Brochure
Backpack Lecture 9 - detailed understanding and code implementation
PAT (Basic Level) Practice (中文)1003 我要通过! (20分) C语言实现
Calculus Learning
工作好多年,回忆人生--高中三年
训练查看(问题暂存)
Verilog equivalent operator
微信小程序跳转公众号图文内容
【To .NET】C#数据模型,从Entity Framework Core到LINQ
微信小程序用户拒绝授权地理位置信息再次调起授权窗口
train_on_batch保存一下loss函数变化的图像
Batch processing of experimental contact angle data matlab analysis
微信小程序底部导航栏中间突出









