当前位置:网站首页>[target detection] KL loss: bounding box progression with uncertainty for accurate object detection
[target detection] KL loss: bounding box progression with uncertainty for accurate object detection
2022-07-29 06:04:00 【Dull cat】
List of articles

The paper :Bounding Box Regression with Uncertainty for Accurate Object Detection
Code :https://github.com/yihui-he/KL-Loss
Source :CVPR2019
Extract the main points :
- Modified the regression method , Study (x1, y1, x2, y2), The previous method is to learn (x1, y1, w, h)
- Put forward KL loss Instead of smooth L1 loss
- stay Soft-NMS Variance voting is added after , Fix box and score
effect :
- MS-COCO On dataset , be based on VGG-16 Infrastructure Faster RCNN, Precision from 23.6% Upgrade to 29.1%.
- about ResNet50 Infrastructure FPN Mask-RCNN The average accuracy is improved 1.8%.
One 、 background
Large target detection data set ( Such as COCO) The dimension boxes of are relatively clear and accurate , But there will also be some ambiguities , Increase the difficulty of marking .
Pictured 1a and 1c Shown , When the target is partially occluded and the boundary is not clear , The callout box is difficult to determine .
For the regression of the bounding box , Generally used L1 loss, Take into account the problems that are not marked , And it is generally believed that , When the classification score is high , The corresponding regression effect is also better , But there are also graphs 2 In this case .


Two 、 Method
In order to solve the above ambiguous annotation problem ( The position of the box is not well defined ), The author puts forward a kind of KL loss, Come and learn two things at the same time :box Return to + Location uncertainty
The detailed content :
- To capture bbox The uncertainty of regression , The author will label The position coordinates are modeled as Dirac delta function ( Impulse function ), Will predict box The position coordinates are modeled as Gaussian Distribution
- take bbox Returning loss Defined as predicted distribution and real distribution KL The divergence ( That is to say L1 loss Etc )
KL loss Three advantages of :
- It can well capture the uncertainty in the data set : bbox The regressor can be indefinite bbox Get a small one loss
- The learned variance is very useful in post-processing : The author proposes variance voting (variance voting), stay NMS When , The predicted variance is used to weight the position of its neighborhood , To vote for the candidate box
- The learned probability distribution is interpretable : Because it reflects the uncertainty of the prediction frame , For many downstream tasks ( Automatic acceleration 、 robot ) It is necessary to

2.1 BBox Distributed modeling
be based on Faster RCNN or Mask RCNN( Pictured 3 Shown ), The author proposes a separate regression bbox The boundary of the ,bbox It can be expressed as ( x 1 , y 1 , x 2 , y 2 ) ∈ R 4 (x_1, y_1, x_2, y_2) \in R^4 (x1,y1,x2,y2)∈R4.
For convenience ,bbox The coordinates of are x x x To express , Because the optimization of each coordinate is independent .
In order to estimate the position score by position , The network in this paper will predict a probability distribution , Instead of bbox The location of .
The predicted probability distribution is simplified to a Gaussian distribution :

among :
- Θ \Theta Θ It's a learnable parameter
- x e x_e xe It's estimated bbox Location
- σ \sigma σ As the standard deviation , It also indicates uncertainty , σ → 0 \sigma \to 0 σ→0, It shows that the more accurate the location of network prediction , Higher confidence .
The real location can also be modeled as a special Gaussian distribution , σ → 0 \sigma \to 0 σ→0, Namely Dirac delta Distribution :

among :
- x g x_g xg yes bbox The real location of
Dirac delta Distribution : stay 0 Place infinite , In other locations is 0
characteristic : ∫ δ ( x ) d x = 1 \int \delta(x)dx=1 ∫δ(x)dx=1
2.2 KL-Loss: Calculate the distribution of prediction and truth Loss

- Orange is true
- Grey prediction is more accurate , Variance is small , The position is close to the true value
- Blue prediction is poor , The variance is large , The position is far away from the true value
The above will predict the results and reality label After modeling , You can use N Sample points to estimate the parameters Θ ^ \hat{\Theta} Θ^, Minimize prediction and true distribution KL distance :
KL Divergence is used for regression of box position :
When the network predicts a large variance σ 2 \sigma^2 σ2 when , L r e g L_{reg} Lreg It will be very small , Location x e x_e xe Your estimate will be more accurate . Pictured 4 Shown .
because L r e g L_{reg} Lreg Not dependent on the latter two , There are the following rules :

When σ = 1 \sigma=1 σ=1 when ,KL loss Evolved into a normative Euclidean loss:

The loss The function is related to the estimated position and the position standard deviation :

because σ \sigma σ It's the denominator , So at the beginning of training, there may be a gradient explosion , for fear of , Author use α = l o g ( σ 2 ) \alpha = log(\sigma^2) α=log(σ2) Instead of α = σ \alpha=\sigma α=σ:
When ∥ x g − x e ∥ > 1 \|x_g-x_e\|>1 ∥xg−xe∥>1 when , The function used is similar to smooth L1 loss:
At the beginning of training , Use random Gauss to initialize FC Layer parameters , The standard deviation is set to 0.0001, The mean value is set to 0, here KL loss and smooth L1 loss similar .
2.3 Variance voting strategy : Correct the position of the frame
NMS: Greater than IoU The threshold box is deleted directly
Non maximum suppression , For reasoning or two-stage method generation proposal, Not used in the training process , Generally, we use inner class NMS
It's the same as the name , Do not suppress the box with the largest score , And then according to IoU threshold ,IoU Greater than threshold , Then the box is suppressed ( That is, set the corresponding score to 0),IoU Less than the threshold, the box is retained .

problem :
- When the threshold is too small , There are many suppressed boxes , It is easy to cause missed inspection ( Especially the box close to the box with the largest score )
- When the threshold is too large , There are few suppressed boxes , It is easy to cause false detection
Soft NMS: Greater than IoU Threshold box score suppression , Less than IoU The box score of the threshold remains unchanged
in consideration of NMS The problem of , Especially in dense scenes , So some improvements have been made .
NMS The score of the suppressed box will be set to 0, and soft NMS Think the box that is closer to the candidate box , The more likely it is ” False positive ”, The decay of the corresponding score should be more serious , So the score is attenuated , Attenuation mode :
Use 1-IoU The product of the score is the attenuated value :

When the overlap between adjacent detection frames and candidate frames exceeds the threshold N t N_t Nt when , Score linear decay , But this function is not a nonlinear function , Easy to mutate , So we need to find a continuous function , The score of boxes without overlap is not attenuated , The highly overlapping boxes are greatly attenuated , The high IoU There is a high penalty for , low IoU There is a low penalty for , And it is gradually transitional , So there is the second .Gaussian penalty function :

Var voting: Score correction (soft NMS )+ Position correction ( Assign a higher weight to the box close to the candidate box , Its uncertainty is more )
After predicting the variance of the position , According to the variance of these bbox Vote .
The predicted result is :
( x 1 , y 1 , x 2 , y 2 , s , σ x 1 , σ y 1 , σ x 2 , σ y 2 ) (x_1, y_1, x_2, y_2, s, \sigma_{x_1}, \sigma_{y_1}, \sigma_{x_2}, \sigma_{y_2}) (x1,y1,x2,y2,s,σx1,σy1,σx2,σy2)
Variance voting is based on pairs soft-NMS Processing modification of , It's done soft-NMS after , On the resulting border b m b_m bm Correct the variance based on the network learning .
The new coordinates are calculated as follows , x i x_i xi It's No i i i The coordinates of the two boxes :
First , Select the box with the highest classification score b b b
then , For all and b b b There is IoU Intersecting boxes ( I o U ( b i , b ) IoU(b_i, b) IoU(bi,b) >0), Calculate weights p i p_i pi
I o U ( b i , b ) IoU(b_i, b) IoU(bi,b) The bigger it is , be p i p_i pi The bigger it is , That is, the closer the two boxes are p i p_i pi The bigger the value is. , The greater the weight , That is to say, the closer the box is
Last , According to the weight , To update b b b Coordinates of ( The four coordinates are updated respectively )
The author gives a higher weight to the box that is closer to the true value but has a lower classification score .
among :
- σ t \sigma_t σt Is an adjustable parameter
There are two kinds of neighborhood boxes that will be reduced in weight :
- A frame with high square difference
- And the candidate box IoU Smaller frame
Classification scores do not participate in voting , Because a low classification score may have a high position score .

3、 ... and 、 effect





边栏推荐
- 【Transformer】ACMix:On the Integration of Self-Attention and Convolution
- 第三周周报 ResNet+ResNext
- yum本地源制作
- 【go】defer的使用
- Ribbon learning notes II
- Most PHP programmers don't understand how to deploy safe code
- mysql 的show profiles 使用。
- Process management of day02 operation
- 【语义分割】SETR_Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformer
- C # judge whether the user accesses by mobile phone or computer
猜你喜欢

Reporting service 2016 custom authentication

虚假新闻检测论文阅读(五):A Semi-supervised Learning Method for Fake News Detection in Social Media

第2周学习:卷积神经网络基础

The third week of postgraduate freshman training: resnet+resnext

File permissions of day02 operation

ASM piling: after learning ASM tree API, you don't have to be afraid of hook anymore

并发编程学习笔记 之 工具类Semaphore(信号量)

ssm整合

Flutter正在被悄悄放弃?浅析Flutter的未来

【go】defer的使用
随机推荐
并发编程学习笔记 之 Lock锁及其实现类ReentrantLock、ReentrantReadWriteLock和StampedLock的基本用法
Reporting Services- Web Service
Personal learning website
【语义分割】Fully Attentional Network for Semantic Segmentation
【目标检测】Generalized Focal Loss V1
[ml] PMML of machine learning model -- Overview
【TensorRT】将 PyTorch 转化为可部署的 TensorRT
PHP write a diaper to buy the lowest price in the whole network
研究生新生培训第三周:ResNet+ResNeXt
Flink, the mainstream real-time stream processing computing framework, is the first experience.
Flutter正在被悄悄放弃?浅析Flutter的未来
通过简单的脚本在Linux环境实现Mysql数据库的定时备份(Mysqldump命令备份)
【比赛网站】收集机器学习/深度学习比赛网站(持续更新)
Centos7 silently installs Oracle
Flink connector Oracle CDC synchronizes data to MySQL in real time (oracle19c)
Use of file upload (2) -- upload to Alibaba cloud OSS file server
Detailed explanation of MySQL statistical function count
C # judge whether the user accesses by mobile phone or computer
Briefly talk about the difference between pendingintent and intent
Ribbon learning notes II