当前位置：网站首页>[target detection] generalized focal loss v1

[target detection] generalized focal loss v1

2022-07-29 06:04:00 【Dull cat】

List of articles

The paper ： https://arxiv.org/pdf/2006.04388.pdf

Code ：https://github.com/open-mmlab/mmdetection/tree/master/configs/gfl

Source ：NIPS2020

emphasis ：

A new method for determining the position of the bounding box is proposed generalize Modeling of distribution （ The clearer the boundary, the better the learning , The distribution will be sharp , The more fuzzy the boundary is, the worse the learning will be , Flat distribution ）

One 、 background

One-stage Target detector basically models target detection as a task of dense classification and location .

Classification tasks generally use Focal Loss To optimize , Positioning tasks are generally learning Dirac delta Distribution .

Such as FCOS A quantity for estimating the positioning quality is proposed in ：IoU score or centerness score, then NMS When sorting , Multiply the classification score by the box quality score .

Current One-stage The target detector usually introduces a separate prediction branch to quantify the positioning effect , The prediction effect of positioning is helpful for classification , So as to improve the detection performance .

This paper proposes three basic elements ：

Quality estimation of detection frame （ Such as IoU score or FCOS Of centerness score）
classification
location

There are two main problems in the current implementation ：

1、 The classification score and frame quality estimation are inconsistent during training and testing

Insert picture description here

Inconsistent usage ： Classification and quality estimation , In the training process is separate , But in the test process, it is multiplied together , As NMS score Sort by , There is a certain gap
The objects are not the same ： With the help of Focal Loss Power , Classification and branching can make a small number of positive samples and a large number of negative samples train together , But the quality estimation of the box is actually only for positive sample training .
about one-stage detector , do NMS When sorting , All samples will multiply the classification score by the box quality score , To sort , Therefore, there must be some negative samples with low scores whose quality prediction has no supervision signal in the training process , That is, the quality of a large number of negative samples is not measured . This will lead to a negative sample with a low classification score , Due to the prediction of a very high box quality score , As a result, it is predicted to be a positive sample .

2、bbox regression The expression of is not flexible （Dirac delta Inflexible distribution ）, There is no way to model complex scenes uncertainty

In a complex scene , The representation of bounding box has strong uncertainty , The essence of the existing box regression is to model a very single Dirac distribution , Very inflexible . So the author hopes to use a general To model the representation of the bounding box . The problem is shown in the figure 3 Shown （ Like a skateboard blurred by water , And heavily sheltered elephants ）：

Insert picture description here

Two 、 Method

For the two existing problems ：

① Training and testing are inconsistent

② The modeling of frame position distribution is not universal

The author proposes the following solutions .

Solve problem one ： Build a classification-IoU joint representation

For the first inconsistency between training and testing , In order to ensure the consistency of training and testing , At the same time, both classification and frame quality prediction can be trained to all positive and negative samples , The author proposes to combine the expression of box with classification score .

Method ：

When the category of prediction is ground-truth When it comes to categories , Use position quality score As confidence , The position quality score in this paper is to use IoU Score to measure .

Insert picture description here

Problem solving II ： Directly regress an arbitrary distribution to model the representation of the box

Method ： Use softmax To achieve , It involves deriving from the integral form of Dirac distribution to the integral form of general distribution to express the box

thus , It eliminates the inconsistency between training and testing , And established as shown in the figure 2b Strong correlation between classification and positioning .

Besides , Negative samples can be used 0 quality scores To supervise .

Insert picture description here
Generalized Focal Loss The composition of the ：

QFL：Quality Focal Loss, Joint expression of learning classification score and position score
DFL：Distribution Focal Loss, Model the position of the box as a general distribution, Let the network quickly focus on the distribution of positions close to the target position

Generalized Focal Loss How it was put forward ：

① original FL：

Today's intensive forecasting tasks , Generally used Focal Loss To optimize the classification and branch , Can solve the prospect 、 Problems such as the imbalance of the number of backgrounds , The formula is as follows , But it can only support 0/1 Such discrete categories label.

Insert picture description here

**① Put forward QFL：Quality Focal Loss **

The standard one-hot The code is 1, Other positions are 0.

Use classification-IoU features , Be able to put the standard one-hot Coding softens , Make it more soft, The goal of learning $y\in[0,1]$ , Rather than direct learning objectives “1”.

For this paper, joint representation ,label Turned into 0~1 Continuous values of .FL No longer applicable .

y=0 when , Negative samples ,quality score by 0
0<y<=1 when , Indicates a positive sample , And position score label y It's using IoU score It means , be in 0~1 Between

Insert picture description here

In order to ensure QFL Yes Focal Loss The balance of difficult and easy samples 、 The ability of positive and negative samples , It can also support the supervision of continuous values , Need to be right FL Make some extensions .

Cross entropy $log(p_t)$ An extension of ： $-((1-y)log(1-\sigma) + ylog(\sigma))$
Modulation factor $(1-p_t)^\gamma$ An extension of ： $|y-\sigma|^\beta (\beta >=0)$

Quality Focal Loss（QFL） Ultimately for ：

Insert picture description here

$\sigma = y$ yes QFL Global minimum solution
chart 5a It shows the difference $\beta$ The effect of （y=0.5）
$\|y-\sigma\|^\beta$ Is a modulation factor , When a sample quality When the estimation is inaccurate , The modulation factor will be very large , Let the network pay more attention to this difficult sample , When quality When the estimate of tends to be accurate , namely $\sigma$ → $y$ when , The modulation factor tends to 0, This sample pair loss The influence weight of will be reduced . $\beta$ Control the process of reduction , this paper $\beta=2$ The optimal .

Insert picture description here

② Put forward DFL: Distribution Focal Loss

The position learning in this paper takes the relative offset as the regression goal , And the previous articles are generally based on Dirac distribution $\delta(x-y)$ For guidance , Satisfy $\int_{-\infty}^{+\infty} \delta(x-y)dx = 1$ , We usually use the full connection layer to realize .

But this paper takes into account the diversity of real distribution , Choose to use a more general distribution to represent the location distribution .

The real distribution is usually not too far away from the location of the annotation , So another one is added Loss

Insert picture description here

DFL It can make the network focus on the target faster $y$ Nearby values , Increase their probability
The meaning is to optimize and label in the form of cross entropy y The probability of the closest left and right positions , So that the network can focus on the distribution of adjacent areas of the target location faster

QFL and DFL It can be expressed as GFL：

Insert picture description here

Variable is $y_l$ and $y_r$
The predicted distributions of the above two variables are ： $p_{y_l}$ and $p_{y_r}$ , And $p_{y_l} + p_{y_r} = 1$
The final prediction is ： $\hat{y}=y_lp_{y_l}+y_rp_{y_r}$ , And $y_l <= \hat{y} <= y_r$