当前位置:网站首页>[target detection] generalized focal loss v1
[target detection] generalized focal loss v1
2022-07-29 06:04:00 【Dull cat】
List of articles

The paper : https://arxiv.org/pdf/2006.04388.pdf
Code :https://github.com/open-mmlab/mmdetection/tree/master/configs/gfl
Source :NIPS2020
emphasis :
- A new method for determining the position of the bounding box is proposed generalize Modeling of distribution ( The clearer the boundary, the better the learning , The distribution will be sharp , The more fuzzy the boundary is, the worse the learning will be , Flat distribution )
One 、 background
One-stage Target detector basically models target detection as a task of dense classification and location .
Classification tasks generally use Focal Loss To optimize , Positioning tasks are generally learning Dirac delta Distribution .
Such as FCOS A quantity for estimating the positioning quality is proposed in :IoU score or centerness score, then NMS When sorting , Multiply the classification score by the box quality score .
Current One-stage The target detector usually introduces a separate prediction branch to quantify the positioning effect , The prediction effect of positioning is helpful for classification , So as to improve the detection performance .
This paper proposes three basic elements :
- Quality estimation of detection frame ( Such as IoU score or FCOS Of centerness score)
- classification
- location
There are two main problems in the current implementation :
1、 The classification score and frame quality estimation are inconsistent during training and testing
Inconsistent usage : Classification and quality estimation , In the training process is separate , But in the test process, it is multiplied together , As NMS score Sort by , There is a certain gap
The objects are not the same : With the help of Focal Loss Power , Classification and branching can make a small number of positive samples and a large number of negative samples train together , But the quality estimation of the box is actually only for positive sample training .
about one-stage detector , do NMS When sorting , All samples will multiply the classification score by the box quality score , To sort , Therefore, there must be some negative samples with low scores whose quality prediction has no supervision signal in the training process , That is, the quality of a large number of negative samples is not measured . This will lead to a negative sample with a low classification score , Due to the prediction of a very high box quality score , As a result, it is predicted to be a positive sample .
2、bbox regression The expression of is not flexible (Dirac delta Inflexible distribution ), There is no way to model complex scenes uncertainty
- In a complex scene , The representation of bounding box has strong uncertainty , The essence of the existing box regression is to model a very single Dirac distribution , Very inflexible . So the author hopes to use a general To model the representation of the bounding box . The problem is shown in the figure 3 Shown ( Like a skateboard blurred by water , And heavily sheltered elephants ):
Two 、 Method
For the two existing problems :
① Training and testing are inconsistent
② The modeling of frame position distribution is not universal
The author proposes the following solutions .
Solve problem one : Build a classification-IoU joint representation
For the first inconsistency between training and testing , In order to ensure the consistency of training and testing , At the same time, both classification and frame quality prediction can be trained to all positive and negative samples , The author proposes to combine the expression of box with classification score .
Method :
When the category of prediction is ground-truth When it comes to categories , Use position quality score As confidence , The position quality score in this paper is to use IoU Score to measure .
Problem solving II : Directly regress an arbitrary distribution to model the representation of the box
Method : Use softmax To achieve , It involves deriving from the integral form of Dirac distribution to the integral form of general distribution to express the box
thus , It eliminates the inconsistency between training and testing , And established as shown in the figure 2b Strong correlation between classification and positioning .
Besides , Negative samples can be used 0 quality scores To supervise .
Generalized Focal Loss The composition of the :
- QFL:Quality Focal Loss, Joint expression of learning classification score and position score
- DFL:Distribution Focal Loss, Model the position of the box as a general distribution, Let the network quickly focus on the distribution of positions close to the target position
Generalized Focal Loss How it was put forward :
① original FL:
Today's intensive forecasting tasks , Generally used Focal Loss To optimize the classification and branch , Can solve the prospect 、 Problems such as the imbalance of the number of backgrounds , The formula is as follows , But it can only support 0/1 Such discrete categories label.
**① Put forward QFL:Quality Focal Loss **
The standard one-hot The code is 1, Other positions are 0.
Use classification-IoU features , Be able to put the standard one-hot Coding softens , Make it more soft, The goal of learning y ∈ [ 0 , 1 ] y\in[0,1] y∈[0,1], Rather than direct learning objectives “1”.
For this paper, joint representation ,label Turned into 0~1 Continuous values of .FL No longer applicable .
- y=0 when , Negative samples ,quality score by 0
- 0<y<=1 when , Indicates a positive sample , And position score label y It's using IoU score It means , be in 0~1 Between
In order to ensure QFL Yes Focal Loss The balance of difficult and easy samples 、 The ability of positive and negative samples , It can also support the supervision of continuous values , Need to be right FL Make some extensions .
- Cross entropy − l o g ( p t ) -log(p_t) −log(pt) An extension of : − ( ( 1 − y ) l o g ( 1 − σ ) + y l o g ( σ ) ) -((1-y)log(1-\sigma) + ylog(\sigma)) −((1−y)log(1−σ)+ylog(σ))
- Modulation factor ( 1 − p t ) γ (1-p_t)^\gamma (1−pt)γ An extension of : ∣ y − σ ∣ β ( β > = 0 ) |y-\sigma|^\beta (\beta >=0) ∣y−σ∣β(β>=0)
Quality Focal Loss(QFL) Ultimately for :
- σ = y \sigma = y σ=y yes QFL Global minimum solution
- chart 5a It shows the difference β \beta β The effect of (y=0.5)
- ∥ y − σ ∥ β \|y-\sigma\|^\beta ∥y−σ∥β Is a modulation factor , When a sample quality When the estimation is inaccurate , The modulation factor will be very large , Let the network pay more attention to this difficult sample , When quality When the estimate of tends to be accurate , namely σ \sigma σ → y y y when , The modulation factor tends to 0, This sample pair loss The influence weight of will be reduced . β \beta β Control the process of reduction , this paper β = 2 \beta=2 β=2 The optimal .
② Put forward DFL: Distribution Focal Loss
The position learning in this paper takes the relative offset as the regression goal , And the previous articles are generally based on Dirac distribution δ ( x − y ) \delta(x-y) δ(x−y) For guidance , Satisfy ∫ − ∞ + ∞ δ ( x − y ) d x = 1 \int_{-\infty}^{+\infty} \delta(x-y)dx = 1 ∫−∞+∞δ(x−y)dx=1, We usually use the full connection layer to realize .
But this paper takes into account the diversity of real distribution , Choose to use a more general distribution to represent the location distribution .
The real distribution is usually not too far away from the location of the annotation , So another one is added Loss
- DFL It can make the network focus on the target faster y y y Nearby values , Increase their probability
- The meaning is to optimize and label in the form of cross entropy y The probability of the closest left and right positions , So that the network can focus on the distribution of adjacent areas of the target location faster
QFL and DFL It can be expressed as GFL:
- Variable is y l y_l yl and y r y_r yr
- The predicted distributions of the above two variables are : p y l p_{y_l} pyl and p y r p_{y_r} pyr, And p y l + p y r = 1 p_{y_l} + p_{y_r} = 1 pyl+pyr=1
- The final prediction is : y ^ = y l p y l + y r p y r \hat{y}=y_lp_{y_l}+y_rp_{y_r} y^=ylpyl+yrpyr, And y l < = y ^ < = y r y_l <= \hat{y} <= y_r yl<=y^<=yr
Trained loss as follows :
3、 ... and 、 effect
边栏推荐
- Spring, summer, autumn and winter with Miss Zhang (3)
- 【Transformer】AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
- 第三周周报 ResNet+ResNext
- Android Studio 实现登录注册-源代码 (连接MySql数据库)
- Spring, summer, autumn and winter with Miss Zhang (4)
- [database] database course design - vaccination database
- The third week of postgraduate freshman training: resnet+resnext
- 并发编程学习笔记 之 Lock锁及其实现类ReentrantLock、ReentrantReadWriteLock和StampedLock的基本用法
- These process knowledge you must know
- Windos下安装pyspider报错:Please specify --curl-dir=/path/to/built/libcurl解决办法
猜你喜欢
The differences and reasons between MySQL with and without quotation marks when querying string types
Yum local source production
第一周任务 深度学习和pytorch基础
clion+opencv+aruco+cmake配置
MarkDown简明语法手册
神经网络相关知识回顾(PyTorch篇)
Markdown syntax
Technology that deeply understands the principle of MMAP and makes big manufacturers love it
【网络设计】ConvNeXt:A ConvNet for the 2020s
[semantic segmentation] full attention network for semantic segmentation
随机推荐
DataX installation
nacos外置数据库的配置与使用
性能优化之趣谈线程池:线程开的越多就越好吗?
【语义分割】语义分割综述
【Transformer】AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
【目标检测】KL-Loss:Bounding Box Regression with Uncertainty for Accurate Object Detection
【Transformer】AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
【CV】请问卷积核(滤波器)3*3、5*5、7*7、11*11 都是具体什么数?
Ribbon学习笔记二
【Transformer】ACMix:On the Integration of Self-Attention and Convolution
简单聊聊 PendingIntent 与 Intent 的区别
【语义分割】Mapillary 数据集简介
Operation commands in anaconda, such as removing old environment, adding new environment, viewing environment, installing library, cleaning cache, etc
A preliminary study on fastjason's autotype
【语义分割】Fully Attentional Network for Semantic Segmentation
Briefly talk about the difference between pendingintent and intent
【Attention】Visual Attention Network
C # judge whether the user accesses by mobile phone or computer
Detailed explanation of tool classes countdownlatch and cyclicbarrier of concurrent programming learning notes
Flink connector Oracle CDC synchronizes data to MySQL in real time (oracle19c)