当前位置：网站首页>Ga-rpn: recommended area network for guiding anchors

Ga-rpn: recommended area network for guiding anchors

2022-07-29 08:17:00 【The way of code】

Address of thesis ：https://arxiv.org/pdf/1901.03278.pdf

Code address ：GitHub - open-mmlab/mmdetection: OpenMMLab Detection Toolbox and Benchmark

1.RPN

RPN namely Region Proposal Network, Yes, it is RON To select the region of interest , namely proposal extraction. for example , If a region's p>0.5, It is thought that there may be 80 One of the categories , It's not clear what kind it is . Only this and nothing more , The network only needs to select these areas that may contain objects , These selected areas are also called ROI（Region of Interests）, That is, the region of interest . Of course RPN At the same time feature map Frame these ROI Approximate location of the region of interest , The output Bounding Box.

RPN Detailed introduction ：https://mp.weixin.qq.com/s/VXgbJPVoZKjcaZjuNwgh-A

2.Guided Anchoring

Usually use （x,y,w,h） To describe a anchor, That is, the coordinates of the center point and the width and height . The article will anchor The distribution of is expressed by conditional probability , Formula for ：
$p (x, y, w, h ∣ I) = p (x, y ∣ I) p (w, h ∣ x, y, I)$
The distribution of two conditional probabilities , After representing a given image feature anchor Of Central point probability distribution , And after the given image feature and center point Shape probability distribution . So it looks like , So what we got anchor The method of can be regarded as a special case of the above conditional probability distribution , namely p(x,y|I) Is evenly distributed and p(w,h|x,y,I) Is the impulse function .

According to the formula above ,anchor The generation process of can be divided into two steps ,anchor Position prediction and shape prediction .

The methods used in this paper are as follows ：

This framework is in the original RPN Based on the characteristic diagram of , Use two scores to predict anchor The location and shape of , And then combine them to get anchor. Then use a Feature Adaption The module carries out anchor Adjustment of features , Get a new feature map for future prediction （anchor Classification and regression of ）. The whole method can be trained end-to-end , And it's only increased compared with before 3 individual 1×1 conv And a 3×3 deformable conv, The change of model parameters is very small .

（1） Location prediction

The goal of the location prediction branch is to predict which areas should be generated as the center anchor, It is also a binary classification problem , But it's different from RPN The classification of , We don't predict whether each point is a prospect or a background , It's about predicting whether the center of the object is .

We will take the whole feature map The area is divided into the central area of the object 、 Peripheral area and ignore area , The general idea is to groundtruth A small piece in the center of the box corresponds to feature map The area on the is marked as the central area of the object , In training as Positive sample , Other areas are marked as Ignore perhaps Negative sample . Finally, by selecting the position where the corresponding probability value is higher than the predetermined threshold, the area where the object may be active is determined . $F 1$ Use 1×1 Convolution of , Get and $F 1$ Output of the same resolution , $N_L$ Get the value of each position of the output to represent the original figure I The possibility of objects appearing in the corresponding position on , That is, probability diagram , Finally, by selecting the position where the corresponding probability value is higher than the predetermined threshold, the area where the object may be active is determined .

By location prediction , We can filter out a small area as anchor Candidate center point location of , bring anchor The number is greatly reduced . In this way, in the end, we can only aim at anchor Calculate where .

（2） Shape prediction

Shape prediction branch is the target is given anchor Center point , Predict the best length and width , This is a question of return .

use 1×1 Convolution network of $N_s$ Input $F_1$ , Output and $F_1$ Of the same size 2 Characteristic diagram of the channel , Each channel represents dw and dh, Indicates the best possible for each location anchor Size . Although our prediction goal is w and h, But the direct prediction of these two figures is unstable , Because the scope is very large , So approximate space [0,1000] Mapped to [-1,1] in , Formula for ：
$w=\sigma \times s \times e^{dw},w=\sigma \times s \times e^{dh}$
among s It's stride ,σ It's an empirical factor , In the experiment, take σ=8. In the experiment dw,dh Two channel mapping of , Pixel by pixel conversion is achieved through this equation . Use directly in the article IOU Learn as a supervisor w and h.

about anchor and ground truth Matching problems , Tradition RPN Are all direct calculations anchor And all ground truth Of IOU, And then anchor Match to IOU The biggest one ground truth, But now due to our improvement ,anchor Of w and h It's all uncertain , Is a variable that needs to be predicted . In this article anchor And some ground truth Of IOU Expressed as ：
$vIOU(a_{wh},gt)=\max_{w>0,h>0}IOU_{normal}(a_{wh},gt)$
We can't put all the possible w and h Go through it and find IOU The maximum of , In this paper, a new method is used 9 Group possible w and h As a sample , The approximate effect is enough .

Here we can generate anchor 了 . Generated at this time anchor It's sparse and each position is different . The experiment can get the average at this time recall It has surpassed the ordinary RPN 了 , Only two more conv.

（3） Feature fine tuning module

Because the shape of each position is different , Big anchor Corresponding to the larger receptive field , Small anchor Corresponding to the small receptive field . So it can't be based on anchor That's right feature map Perform convolution to predict , But to feature map Conduct feature adaptation. The author uses deformable convolution （deformable convolution） Thought , Each position is converted separately according to the shape .

The way is to put anchor The shape information of is directly integrated into the feature map , Get a new feature map to adapt to each position anchor The shape of the . Here we use the above 3×3 The deformable convolution of is used to modify the original feature map , The variation of deformable convolution is through anchor Of w and h Through a 1×1 conv Got .
$f'_i=N_t(f_i,w_i,h_i)$
among ,fi It's No i A feature of location ,(wi, hi) Is the corresponding anchor shape .NT adopt 3×3 Implementation of deformation convolution . First, the offset field is predicted by the shape prediction Branch offset field, Then the original with offset feature map Do deformation convolution to get adapted features. Then further classify and bounding box Return to .

By doing this , Reached the goal of making feature The effective scope and anchor The shape is closer to the purpose , The same conv Different positions of can also represent different shapes and sizes anchor 了 .

Examples of experimental results in this paper ：