当前位置：网站首页>Reppoints: advanced order of deformable convolution

Reppoints: advanced order of deformable convolution

2022-07-06 17:42:00 【Xiaobai learns vision】

Click on the above “ Xiaobai studies vision ”, Optional plus " Star standard " or “ Roof placement ”

 Heavy dry goods , First time delivery

I have always appreciated Microsoft's research , Especially deformable convolution , In my opinion, this job is very creative ( I like the idea of deformable convolution ), This time RepPoints This latest paper , Review deformable convolution .

This paper mainly deals with DCNv1、DCNv2、RepPoints Three articles , among RepPoints Think of it as DCNv3. These three articles continue to improve deformable convolution , Improve the geometric deformation modeling ability of the model .

Review of DCNv1 and DCNv2

By scale 、 Posture 、 Geometric changes caused by visual angle and partial deformation are the main challenges of target recognition and detection . In convolution /RoI In the pool module ,DCN The ability of geometric deformation modeling can be obtained by learning the location of sampling points .

Deformable Convolution

The sampling position of the standard convolution is changed by the offset obtained from the input feature learning .

Deformable convolution can be expressed as ：

Given a Convolution kernel of sampling points , Expressed as No Weights of sampling points , Expressed as No Predefined offset of sampling points ( for example , and Defined a 3x3 Convolution kernel ). Defined as the input feature location Characteristics of , Defined as the output feature location Characteristics of . For convolution learning, the second Position offset of sampling points . because Is the decimal , therefore By bilinear interpolation .

PS： The resolution of the offset feature is the same as that of the input feature , And the number of channels is twice the number of sampling points ( That is, every position has x and y Offset in both directions ).

Modulated Deformable Convolution

Compared with deformable convolution, it has one more modulation factor .

Modulation deformable convolution can be expressed as ：

Expressed as No Modulation factor of sampling points ( The scope is Between ).

PS： The resolution of the modulation factor feature is the same as that of the input feature , And the number of channels is the number of sampling points , The number of channels after adding the offset feature is three times the number of sampling points ( That is, every position has x and y Offset in both directions , There is also a modulation factor ).

Deformable RoI Pooling

Given an input RoI,RoI pooling Divide it into individual bins. First, through RoI pooling Get the pooled feature maps, And then through a fc The layer produces a normalized offset ( This offset can be converted into ).

Single bin The output characteristics of can be expressed as ：

Expressed as No individual bin Of the Location of two sampling points , Expressed as No individual bin The number of sampling points of . By bilinear interpolation . For the first time individual bin The offset .

PS：fc The output of the layer is bin Twice as many ( each bin There are x and y Offset in both directions ).

Modulated Deformable RoI Pooling

Single with modulation factor bin The output characteristics of can be expressed as ：

For the first time individual bin Modulation factor ( The scope is Between ).

PS： There are two fc layer , the second fc The output of the layer is bin Three times the number ( each bin There are x and y Offset in both directions , There is also a modulation factor ).

RepPoints

Motivation

In the target detection task , The bounding box describes the target position of each stage of the target detector .

Although the bounding box is easy to calculate , But they only provide a rough location of the target , It does not completely fit the shape and posture of the target . therefore , The features extracted from the regular cells of the bounding box may be seriously affected by the invalid information of the background content or the foreground area . This may lead to a reduction in feature quality , Thus, the classification performance of target detection is reduced .

In this paper, a new representation method is proposed , be called RepPoints, It provides finer grained positioning and more convenient classification .

As shown in the figure ,RepPoints It's a set of points , By learning to put yourself adaptively above your goals , This method limits the spatial range of the target , It also represents local areas with important semantic information .

RepPoints The training of is driven by target location and recognition , therefore ,RepPoints And ground-truth The bounding box of is closely related , And guide the detector to correctly classify the target .

Bounding Box Representation

The bounding box is a 4 Wei said , The spatial location of the coding target , namely , Represents the center point , Indicates width and height .

Because of its simple and convenient use , Modern target detectors rely heavily on bounding boxes to represent detection pipeline The objects of each stage in .

The target detector with the best performance usually follows a multi-stage Recognition paradigm of , Among them, the target positioning is gradually refined . among , The roles represented by the goal are as follows ：

RepPoints

As mentioned earlier ,4 The dimension bounding box is a rough representation of the target location . The bounding box indicates that only the rectangular space range of the target is considered , Don't think about shape 、 The position of local areas that are important in posture and semantics , These can be used for better positioning and better target feature extraction .

To overcome these limitations ,RepPoints Instead, model a group of adaptive sample points ：

among Is the total number of sample points used in the representation . In this work , The default setting is 9.

Learning RepPoints

RepPoints Learning is driven by the loss of target location and target recognition . In order to calculate the target location loss , We first use a conversion function take RepPoints Convert to pseudo frame (pseudo box). then , Calculate the converted pseudo box and ground truth Differences between bounding boxes .

RPDet

The author designed a kind of non use anchor Object detector for , It USES RepPoints Replace the bounding box as the basic representation of the target .

The evolution process of target representation is as follows :

RepPoints Detector (RPDet) It consists of two recognition stages based on deformable convolution , As shown in the figure .

Deformable convolution and RepPoints Well combined , Because its convolution is calculated on a group of irregularly distributed sampling points , in addition , Its classification can guide the positioning of these points in training .

The first offset in the above figure is obtained through diagonal point supervised learning , The second offset is obtained by classification supervision learning on the basis of the previous offset .

To understand from another angle RepPoints：

Deformable convolution is supervised by the final classification branch and regression Branch , Adaptive attention to the appropriate feature location , Extract better features , But what I haven't figured out is whether deformable convolution can really pay attention to the appropriate feature location , The offset learning of deformable convolution is very free , May run away from the target , So are these features really helpful , These problems have been bothering me , I think the intermediate process of deformable convolution is too vague , Too indirect , It's hard to explain . and RepPoints The learning of offset is directly supervised by the supervision signals of positioning and classification , In this way, the offset can be explained , The offset position makes the positioning and classification more accurate ( That is, the offset position can locate the target and the semantic information can identify the target ), In this way, the offset will not run around , And it's explicable .

Think of it in this way ,RepPoints In fact, it is a further improvement of deformable convolution , Compared with deformable convolution, it has two advantages ：

1. Learn the offset of deformable convolution through direct supervision of location and classification , Make the offset interpretable .

2. You can directly generate pseudo frames by sampling points (pseudo box), There is no need to learn the bounding box , And classification and positioning are related .

The good news ！

Xiaobai learns visual knowledge about the planet

Open to the outside world

 download 1：OpenCV-Contrib Chinese version of extension module 

 stay 「 Xiaobai studies vision 」 Official account back office reply ： Extension module Chinese course , You can download the first copy of the whole network OpenCV Extension module tutorial Chinese version , Cover expansion module installation 、SFM Algorithm 、 Stereo vision 、 Target tracking 、 Biological vision 、 Super resolution processing and other more than 20 chapters .


 download 2：Python Visual combat project 52 speak 
 stay 「 Xiaobai studies vision 」 Official account back office reply ：Python Visual combat project , You can download, including image segmentation 、 Mask detection 、 Lane line detection 、 Vehicle count 、 Add Eyeliner 、 License plate recognition 、 Character recognition 、 Emotional tests 、 Text content extraction 、 Face recognition, etc 31 A visual combat project , Help fast school computer vision .


 download 3：OpenCV Actual project 20 speak 
 stay 「 Xiaobai studies vision 」 Official account back office reply ：OpenCV Actual project 20 speak , You can download the 20 Based on OpenCV Realization 20 A real project , Realization OpenCV Learn advanced .


 Communication group 

 Welcome to join the official account reader group to communicate with your colleagues , There are SLAM、 3 d visual 、 sensor 、 Autopilot 、 Computational photography 、 testing 、 Division 、 distinguish 、 Medical imaging 、GAN、 Wechat groups such as algorithm competition （ It will be subdivided gradually in the future ）, Please scan the following micro signal clustering , remarks ：” nickname + School / company + Research direction “, for example ：” Zhang San  +  Shanghai Jiaotong University  +  Vision SLAM“. Please note... According to the format , Otherwise, it will not pass . After successful addition, they will be invited to relevant wechat groups according to the research direction . Please do not send ads in the group , Or you'll be invited out , Thanks for your understanding ~

原网站

版权声明
本文为[Xiaobai learns vision]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/187/202207060937515072.html

当前位置：网站首页>Reppoints: advanced order of deformable convolution

Reppoints: advanced order of deformable convolution

Review of DCNv1 and DCNv2

RepPoints

边栏推荐

猜你喜欢

随机推荐