当前位置:网站首页>[target detection] generalized focal loss v1
[target detection] generalized focal loss v1
2022-07-29 06:04:00 【Dull cat】
List of articles

The paper : https://arxiv.org/pdf/2006.04388.pdf
Code :https://github.com/open-mmlab/mmdetection/tree/master/configs/gfl
Source :NIPS2020
emphasis :
- A new method for determining the position of the bounding box is proposed generalize Modeling of distribution ( The clearer the boundary, the better the learning , The distribution will be sharp , The more fuzzy the boundary is, the worse the learning will be , Flat distribution )
One 、 background
One-stage Target detector basically models target detection as a task of dense classification and location .
Classification tasks generally use Focal Loss To optimize , Positioning tasks are generally learning Dirac delta Distribution .
Such as FCOS A quantity for estimating the positioning quality is proposed in :IoU score or centerness score, then NMS When sorting , Multiply the classification score by the box quality score .
Current One-stage The target detector usually introduces a separate prediction branch to quantify the positioning effect , The prediction effect of positioning is helpful for classification , So as to improve the detection performance .
This paper proposes three basic elements :
- Quality estimation of detection frame ( Such as IoU score or FCOS Of centerness score)
- classification
- location
There are two main problems in the current implementation :
1、 The classification score and frame quality estimation are inconsistent during training and testing
Inconsistent usage : Classification and quality estimation , In the training process is separate , But in the test process, it is multiplied together , As NMS score Sort by , There is a certain gap
The objects are not the same : With the help of Focal Loss Power , Classification and branching can make a small number of positive samples and a large number of negative samples train together , But the quality estimation of the box is actually only for positive sample training .
about one-stage detector , do NMS When sorting , All samples will multiply the classification score by the box quality score , To sort , Therefore, there must be some negative samples with low scores whose quality prediction has no supervision signal in the training process , That is, the quality of a large number of negative samples is not measured . This will lead to a negative sample with a low classification score , Due to the prediction of a very high box quality score , As a result, it is predicted to be a positive sample .
2、bbox regression The expression of is not flexible (Dirac delta Inflexible distribution ), There is no way to model complex scenes uncertainty
- In a complex scene , The representation of bounding box has strong uncertainty , The essence of the existing box regression is to model a very single Dirac distribution , Very inflexible . So the author hopes to use a general To model the representation of the bounding box . The problem is shown in the figure 3 Shown ( Like a skateboard blurred by water , And heavily sheltered elephants ):
Two 、 Method
For the two existing problems :
① Training and testing are inconsistent
② The modeling of frame position distribution is not universal
The author proposes the following solutions .
Solve problem one : Build a classification-IoU joint representation
For the first inconsistency between training and testing , In order to ensure the consistency of training and testing , At the same time, both classification and frame quality prediction can be trained to all positive and negative samples , The author proposes to combine the expression of box with classification score .
Method :
When the category of prediction is ground-truth When it comes to categories , Use position quality score As confidence , The position quality score in this paper is to use IoU Score to measure .
Problem solving II : Directly regress an arbitrary distribution to model the representation of the box
Method : Use softmax To achieve , It involves deriving from the integral form of Dirac distribution to the integral form of general distribution to express the box
thus , It eliminates the inconsistency between training and testing , And established as shown in the figure 2b Strong correlation between classification and positioning .
Besides , Negative samples can be used 0 quality scores To supervise .
Generalized Focal Loss The composition of the :
- QFL:Quality Focal Loss, Joint expression of learning classification score and position score
- DFL:Distribution Focal Loss, Model the position of the box as a general distribution, Let the network quickly focus on the distribution of positions close to the target position
Generalized Focal Loss How it was put forward :
① original FL:
Today's intensive forecasting tasks , Generally used Focal Loss To optimize the classification and branch , Can solve the prospect 、 Problems such as the imbalance of the number of backgrounds , The formula is as follows , But it can only support 0/1 Such discrete categories label.
**① Put forward QFL:Quality Focal Loss **
The standard one-hot The code is 1, Other positions are 0.
Use classification-IoU features , Be able to put the standard one-hot Coding softens , Make it more soft, The goal of learning y ∈ [ 0 , 1 ] y\in[0,1] y∈[0,1], Rather than direct learning objectives “1”.
For this paper, joint representation ,label Turned into 0~1 Continuous values of .FL No longer applicable .
- y=0 when , Negative samples ,quality score by 0
- 0<y<=1 when , Indicates a positive sample , And position score label y It's using IoU score It means , be in 0~1 Between
In order to ensure QFL Yes Focal Loss The balance of difficult and easy samples 、 The ability of positive and negative samples , It can also support the supervision of continuous values , Need to be right FL Make some extensions .
- Cross entropy − l o g ( p t ) -log(p_t) −log(pt) An extension of : − ( ( 1 − y ) l o g ( 1 − σ ) + y l o g ( σ ) ) -((1-y)log(1-\sigma) + ylog(\sigma)) −((1−y)log(1−σ)+ylog(σ))
- Modulation factor ( 1 − p t ) γ (1-p_t)^\gamma (1−pt)γ An extension of : ∣ y − σ ∣ β ( β > = 0 ) |y-\sigma|^\beta (\beta >=0) ∣y−σ∣β(β>=0)
Quality Focal Loss(QFL) Ultimately for :
- σ = y \sigma = y σ=y yes QFL Global minimum solution
- chart 5a It shows the difference β \beta β The effect of (y=0.5)
- ∥ y − σ ∥ β \|y-\sigma\|^\beta ∥y−σ∥β Is a modulation factor , When a sample quality When the estimation is inaccurate , The modulation factor will be very large , Let the network pay more attention to this difficult sample , When quality When the estimate of tends to be accurate , namely σ \sigma σ → y y y when , The modulation factor tends to 0, This sample pair loss The influence weight of will be reduced . β \beta β Control the process of reduction , this paper β = 2 \beta=2 β=2 The optimal .
② Put forward DFL: Distribution Focal Loss
The position learning in this paper takes the relative offset as the regression goal , And the previous articles are generally based on Dirac distribution δ ( x − y ) \delta(x-y) δ(x−y) For guidance , Satisfy ∫ − ∞ + ∞ δ ( x − y ) d x = 1 \int_{-\infty}^{+\infty} \delta(x-y)dx = 1 ∫−∞+∞δ(x−y)dx=1, We usually use the full connection layer to realize .
But this paper takes into account the diversity of real distribution , Choose to use a more general distribution to represent the location distribution .
The real distribution is usually not too far away from the location of the annotation , So another one is added Loss
- DFL It can make the network focus on the target faster y y y Nearby values , Increase their probability
- The meaning is to optimize and label in the form of cross entropy y The probability of the closest left and right positions , So that the network can focus on the distribution of adjacent areas of the target location faster
QFL and DFL It can be expressed as GFL:
- Variable is y l y_l yl and y r y_r yr
- The predicted distributions of the above two variables are : p y l p_{y_l} pyl and p y r p_{y_r} pyr, And p y l + p y r = 1 p_{y_l} + p_{y_r} = 1 pyl+pyr=1
- The final prediction is : y ^ = y l p y l + y r p y r \hat{y}=y_lp_{y_l}+y_rp_{y_r} y^=ylpyl+yrpyr, And y l < = y ^ < = y r y_l <= \hat{y} <= y_r yl<=y^<=yr
Trained loss as follows :
3、 ... and 、 effect
边栏推荐
- 虚假新闻检测论文阅读(二):Semi-Supervised Learning and Graph Neural Networks for Fake News Detection
- 第一周任务 深度学习和pytorch基础
- The third week of postgraduate freshman training: resnet+resnext
- 并发编程学习笔记 之 原子操作类AtomicInteger详解
- clion+opencv+aruco+cmake配置
- The differences and reasons between MySQL with and without quotation marks when querying string types
- 【数据库】数据库课程设计一一疫苗接种数据库
- 第2周学习:卷积神经网络基础
- 虚假新闻检测论文阅读(四):A novel self-learning semi-supervised deep learning network to detect fake news on...
- 【Transformer】TransMix: Attend to Mix for Vision Transformers
猜你喜欢
[tensorrt] convert pytorch into deployable tensorrt
备份谷歌或其他浏览器插件
Configuration and use of Nacos external database
tensorboard使用
anaconda中移除旧环境、增加新环境、查看环境、安装库、清理缓存等操作命令
Reporting Services- Web Service
Reporting service 2016 custom authentication
【数据库】数据库课程设计一一疫苗接种数据库
C # judge whether the user accesses by mobile phone or computer
Flink connector Oracle CDC synchronizes data to MySQL in real time (oracle19c)
随机推荐
【ML】机器学习模型之PMML--概述
并发编程学习笔记 之 原子操作类AtomicReference、AtomicStampedReference详解
[network design] convnext:a convnet for the 2020s
PyTorch基础知识(可入门)
mysql插入百万数据(使用函数和存储过程)
ROS教程(Xavier)
Basic use of array -- traverse the circular array to find the maximum value, minimum value, maximum subscript and minimum subscript of the array
关于Flow的原理解析
tensorboard使用
【目标检测】6、SSD
【Transformer】ACMix:On the Integration of Self-Attention and Convolution
第一周任务 深度学习和pytorch基础
Ffmpeg creation GIF expression pack tutorial is coming! Say thank you, brother black fly?
【Transformer】AdaViT: Adaptive Tokens for Efficient Vision Transformer
mysql 的show profiles 使用。
Rsync+inotyfy realize real-time synchronization of single data monitoring
【Transformer】SOFT: Softmax-free Transformer with Linear Complexity
IDEA中设置自动build-改动代码,不用重启工程,刷新页面即可
性能优化之趣谈线程池:线程开的越多就越好吗?
第2周学习:卷积神经网络基础