当前位置:网站首页>[target detection] generalized focal loss v1
[target detection] generalized focal loss v1
2022-07-29 06:04:00 【Dull cat】
List of articles

The paper : https://arxiv.org/pdf/2006.04388.pdf
Code :https://github.com/open-mmlab/mmdetection/tree/master/configs/gfl
Source :NIPS2020
emphasis :
- A new method for determining the position of the bounding box is proposed generalize Modeling of distribution ( The clearer the boundary, the better the learning , The distribution will be sharp , The more fuzzy the boundary is, the worse the learning will be , Flat distribution )
One 、 background
One-stage Target detector basically models target detection as a task of dense classification and location .
Classification tasks generally use Focal Loss To optimize , Positioning tasks are generally learning Dirac delta Distribution .
Such as FCOS A quantity for estimating the positioning quality is proposed in :IoU score or centerness score, then NMS When sorting , Multiply the classification score by the box quality score .
Current One-stage The target detector usually introduces a separate prediction branch to quantify the positioning effect , The prediction effect of positioning is helpful for classification , So as to improve the detection performance .
This paper proposes three basic elements :
- Quality estimation of detection frame ( Such as IoU score or FCOS Of centerness score)
- classification
- location
There are two main problems in the current implementation :
1、 The classification score and frame quality estimation are inconsistent during training and testing

Inconsistent usage : Classification and quality estimation , In the training process is separate , But in the test process, it is multiplied together , As NMS score Sort by , There is a certain gap
The objects are not the same : With the help of Focal Loss Power , Classification and branching can make a small number of positive samples and a large number of negative samples train together , But the quality estimation of the box is actually only for positive sample training .
about one-stage detector , do NMS When sorting , All samples will multiply the classification score by the box quality score , To sort , Therefore, there must be some negative samples with low scores whose quality prediction has no supervision signal in the training process , That is, the quality of a large number of negative samples is not measured . This will lead to a negative sample with a low classification score , Due to the prediction of a very high box quality score , As a result, it is predicted to be a positive sample .

2、bbox regression The expression of is not flexible (Dirac delta Inflexible distribution ), There is no way to model complex scenes uncertainty
- In a complex scene , The representation of bounding box has strong uncertainty , The essence of the existing box regression is to model a very single Dirac distribution , Very inflexible . So the author hopes to use a general To model the representation of the bounding box . The problem is shown in the figure 3 Shown ( Like a skateboard blurred by water , And heavily sheltered elephants ):

Two 、 Method
For the two existing problems :
① Training and testing are inconsistent
② The modeling of frame position distribution is not universal
The author proposes the following solutions .
Solve problem one : Build a classification-IoU joint representation
For the first inconsistency between training and testing , In order to ensure the consistency of training and testing , At the same time, both classification and frame quality prediction can be trained to all positive and negative samples , The author proposes to combine the expression of box with classification score .
Method :
When the category of prediction is ground-truth When it comes to categories , Use position quality score As confidence , The position quality score in this paper is to use IoU Score to measure .

Problem solving II : Directly regress an arbitrary distribution to model the representation of the box
Method : Use softmax To achieve , It involves deriving from the integral form of Dirac distribution to the integral form of general distribution to express the box
thus , It eliminates the inconsistency between training and testing , And established as shown in the figure 2b Strong correlation between classification and positioning .
Besides , Negative samples can be used 0 quality scores To supervise .

Generalized Focal Loss The composition of the :
- QFL:Quality Focal Loss, Joint expression of learning classification score and position score
- DFL:Distribution Focal Loss, Model the position of the box as a general distribution, Let the network quickly focus on the distribution of positions close to the target position
Generalized Focal Loss How it was put forward :
① original FL:
Today's intensive forecasting tasks , Generally used Focal Loss To optimize the classification and branch , Can solve the prospect 、 Problems such as the imbalance of the number of backgrounds , The formula is as follows , But it can only support 0/1 Such discrete categories label.

**① Put forward QFL:Quality Focal Loss **
The standard one-hot The code is 1, Other positions are 0.
Use classification-IoU features , Be able to put the standard one-hot Coding softens , Make it more soft, The goal of learning y ∈ [ 0 , 1 ] y\in[0,1] y∈[0,1], Rather than direct learning objectives “1”.
For this paper, joint representation ,label Turned into 0~1 Continuous values of .FL No longer applicable .
- y=0 when , Negative samples ,quality score by 0
- 0<y<=1 when , Indicates a positive sample , And position score label y It's using IoU score It means , be in 0~1 Between

In order to ensure QFL Yes Focal Loss The balance of difficult and easy samples 、 The ability of positive and negative samples , It can also support the supervision of continuous values , Need to be right FL Make some extensions .
- Cross entropy − l o g ( p t ) -log(p_t) −log(pt) An extension of : − ( ( 1 − y ) l o g ( 1 − σ ) + y l o g ( σ ) ) -((1-y)log(1-\sigma) + ylog(\sigma)) −((1−y)log(1−σ)+ylog(σ))
- Modulation factor ( 1 − p t ) γ (1-p_t)^\gamma (1−pt)γ An extension of : ∣ y − σ ∣ β ( β > = 0 ) |y-\sigma|^\beta (\beta >=0) ∣y−σ∣β(β>=0)
Quality Focal Loss(QFL) Ultimately for :

- σ = y \sigma = y σ=y yes QFL Global minimum solution
- chart 5a It shows the difference β \beta β The effect of (y=0.5)
- ∥ y − σ ∥ β \|y-\sigma\|^\beta ∥y−σ∥β Is a modulation factor , When a sample quality When the estimation is inaccurate , The modulation factor will be very large , Let the network pay more attention to this difficult sample , When quality When the estimate of tends to be accurate , namely σ \sigma σ → y y y when , The modulation factor tends to 0, This sample pair loss The influence weight of will be reduced . β \beta β Control the process of reduction , this paper β = 2 \beta=2 β=2 The optimal .

② Put forward DFL: Distribution Focal Loss
The position learning in this paper takes the relative offset as the regression goal , And the previous articles are generally based on Dirac distribution δ ( x − y ) \delta(x-y) δ(x−y) For guidance , Satisfy ∫ − ∞ + ∞ δ ( x − y ) d x = 1 \int_{-\infty}^{+\infty} \delta(x-y)dx = 1 ∫−∞+∞δ(x−y)dx=1, We usually use the full connection layer to realize .
But this paper takes into account the diversity of real distribution , Choose to use a more general distribution to represent the location distribution .
The real distribution is usually not too far away from the location of the annotation , So another one is added Loss

- DFL It can make the network focus on the target faster y y y Nearby values , Increase their probability
- The meaning is to optimize and label in the form of cross entropy y The probability of the closest left and right positions , So that the network can focus on the distribution of adjacent areas of the target location faster
QFL and DFL It can be expressed as GFL:

- Variable is y l y_l yl and y r y_r yr
- The predicted distributions of the above two variables are : p y l p_{y_l} pyl and p y r p_{y_r} pyr, And p y l + p y r = 1 p_{y_l} + p_{y_r} = 1 pyl+pyr=1
- The final prediction is : y ^ = y l p y l + y r p y r \hat{y}=y_lp_{y_l}+y_rp_{y_r} y^=ylpyl+yrpyr, And y l < = y ^ < = y r y_l <= \hat{y} <= y_r yl<=y^<=yr
Trained loss as follows :

3、 ... and 、 effect


边栏推荐
- Markdown syntax
- Ffmpeg creation GIF expression pack tutorial is coming! Say thank you, brother black fly?
- 引入Spacy模块出错—OSError: [E941] Can‘t find model ‘en‘.
- pip安装后仍有解决ImportError: No module named XX
- DataX installation
- [go] use of defer
- torch.nn.Embedding()详解
- Operation commands in anaconda, such as removing old environment, adding new environment, viewing environment, installing library, cleaning cache, etc
- 迁移学习—— Transfer Feature Learning with Joint Distribution Adaptation
- Yum local source production
猜你喜欢

【ML】机器学习模型之PMML--概述

【Transformer】ACMix:On the Integration of Self-Attention and Convolution

Activity交互问题,你确定都知道?

NLP领域的AM模型

【Transformer】AdaViT: Adaptive Tokens for Efficient Vision Transformer

Ffmpeg creation GIF expression pack tutorial is coming! Say thank you, brother black fly?

ROS教程(Xavier)
![[database] database course design - vaccination database](/img/4d/e8aff67e3c643fae651c9f62af2db9.png)
[database] database course design - vaccination database

研究生新生培训第三周:ResNet+ResNeXt

【go】defer的使用
随机推荐
【TensorRT】将 PyTorch 转化为可部署的 TensorRT
The differences and reasons between MySQL with and without quotation marks when querying string types
【目标检测】Generalized Focal Loss V1
Research on the implementation principle of reentrantlock in concurrent programming learning notes
【图像分类】如何使用 mmclassification 训练自己的分类模型
Basic use of array -- traverse the circular array to find the maximum value, minimum value, maximum subscript and minimum subscript of the array
Detailed explanation of tool classes countdownlatch and cyclicbarrier of concurrent programming learning notes
【go】defer的使用
PHP write a diaper to buy the lowest price in the whole network
mysql 的show profiles 使用。
Is flutter being quietly abandoned? On the future of flutter
【卷积核设计】Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs
【Transformer】ACMix:On the Integration of Self-Attention and Convolution
Flink, the mainstream real-time stream processing computing framework, is the first experience.
虚假新闻检测论文阅读(三):Semi-supervised Content-based Detection of Misinformation via Tensor Embeddings
[clustmaps] visitor statistics
anaconda中移除旧环境、增加新环境、查看环境、安装库、清理缓存等操作命令
迁移学习—— Transfer Feature Learning with Joint Distribution Adaptation
有价值的博客、面经收集(持续更新)
【Transformer】AdaViT: Adaptive Vision Transformers for Efficient Image Recognition