当前位置:网站首页>[target detection] KL loss: bounding box progression with uncertainty for accurate object detection
[target detection] KL loss: bounding box progression with uncertainty for accurate object detection
2022-07-29 06:04:00 【Dull cat】
List of articles

The paper :Bounding Box Regression with Uncertainty for Accurate Object Detection
Code :https://github.com/yihui-he/KL-Loss
Source :CVPR2019
Extract the main points :
- Modified the regression method , Study (x1, y1, x2, y2), The previous method is to learn (x1, y1, w, h)
- Put forward KL loss Instead of smooth L1 loss
- stay Soft-NMS Variance voting is added after , Fix box and score
effect :
- MS-COCO On dataset , be based on VGG-16 Infrastructure Faster RCNN, Precision from 23.6% Upgrade to 29.1%.
- about ResNet50 Infrastructure FPN Mask-RCNN The average accuracy is improved 1.8%.
One 、 background
Large target detection data set ( Such as COCO) The dimension boxes of are relatively clear and accurate , But there will also be some ambiguities , Increase the difficulty of marking .
Pictured 1a and 1c Shown , When the target is partially occluded and the boundary is not clear , The callout box is difficult to determine .
For the regression of the bounding box , Generally used L1 loss, Take into account the problems that are not marked , And it is generally believed that , When the classification score is high , The corresponding regression effect is also better , But there are also graphs 2 In this case .
Two 、 Method
In order to solve the above ambiguous annotation problem ( The position of the box is not well defined ), The author puts forward a kind of KL loss, Come and learn two things at the same time :box Return to + Location uncertainty
The detailed content :
- To capture bbox The uncertainty of regression , The author will label The position coordinates are modeled as Dirac delta function ( Impulse function ), Will predict box The position coordinates are modeled as Gaussian Distribution
- take bbox Returning loss Defined as predicted distribution and real distribution KL The divergence ( That is to say L1 loss Etc )
KL loss Three advantages of :
- It can well capture the uncertainty in the data set : bbox The regressor can be indefinite bbox Get a small one loss
- The learned variance is very useful in post-processing : The author proposes variance voting (variance voting), stay NMS When , The predicted variance is used to weight the position of its neighborhood , To vote for the candidate box
- The learned probability distribution is interpretable : Because it reflects the uncertainty of the prediction frame , For many downstream tasks ( Automatic acceleration 、 robot ) It is necessary to
2.1 BBox Distributed modeling
be based on Faster RCNN or Mask RCNN( Pictured 3 Shown ), The author proposes a separate regression bbox The boundary of the ,bbox It can be expressed as ( x 1 , y 1 , x 2 , y 2 ) ∈ R 4 (x_1, y_1, x_2, y_2) \in R^4 (x1,y1,x2,y2)∈R4.
For convenience ,bbox The coordinates of are x x x To express , Because the optimization of each coordinate is independent .
In order to estimate the position score by position , The network in this paper will predict a probability distribution , Instead of bbox The location of .
The predicted probability distribution is simplified to a Gaussian distribution :
among :
- Θ \Theta Θ It's a learnable parameter
- x e x_e xe It's estimated bbox Location
- σ \sigma σ As the standard deviation , It also indicates uncertainty , σ → 0 \sigma \to 0 σ→0, It shows that the more accurate the location of network prediction , Higher confidence .
The real location can also be modeled as a special Gaussian distribution , σ → 0 \sigma \to 0 σ→0, Namely Dirac delta Distribution :
among :
- x g x_g xg yes bbox The real location of
Dirac delta Distribution : stay 0 Place infinite , In other locations is 0
characteristic : ∫ δ ( x ) d x = 1 \int \delta(x)dx=1 ∫δ(x)dx=1
2.2 KL-Loss: Calculate the distribution of prediction and truth Loss
- Orange is true
- Grey prediction is more accurate , Variance is small , The position is close to the true value
- Blue prediction is poor , The variance is large , The position is far away from the true value
The above will predict the results and reality label After modeling , You can use N Sample points to estimate the parameters Θ ^ \hat{\Theta} Θ^, Minimize prediction and true distribution KL distance :
KL Divergence is used for regression of box position :
When the network predicts a large variance σ 2 \sigma^2 σ2 when , L r e g L_{reg} Lreg It will be very small , Location x e x_e xe Your estimate will be more accurate . Pictured 4 Shown .
because L r e g L_{reg} Lreg Not dependent on the latter two , There are the following rules :
When σ = 1 \sigma=1 σ=1 when ,KL loss Evolved into a normative Euclidean loss:
The loss The function is related to the estimated position and the position standard deviation :
because σ \sigma σ It's the denominator , So at the beginning of training, there may be a gradient explosion , for fear of , Author use α = l o g ( σ 2 ) \alpha = log(\sigma^2) α=log(σ2) Instead of α = σ \alpha=\sigma α=σ:
When ∥ x g − x e ∥ > 1 \|x_g-x_e\|>1 ∥xg−xe∥>1 when , The function used is similar to smooth L1 loss:
At the beginning of training , Use random Gauss to initialize FC Layer parameters , The standard deviation is set to 0.0001, The mean value is set to 0, here KL loss and smooth L1 loss similar .
2.3 Variance voting strategy : Correct the position of the frame
NMS: Greater than IoU The threshold box is deleted directly
Non maximum suppression , For reasoning or two-stage method generation proposal, Not used in the training process , Generally, we use inner class NMS
It's the same as the name , Do not suppress the box with the largest score , And then according to IoU threshold ,IoU Greater than threshold , Then the box is suppressed ( That is, set the corresponding score to 0),IoU Less than the threshold, the box is retained .
problem :
- When the threshold is too small , There are many suppressed boxes , It is easy to cause missed inspection ( Especially the box close to the box with the largest score )
- When the threshold is too large , There are few suppressed boxes , It is easy to cause false detection
Soft NMS: Greater than IoU Threshold box score suppression , Less than IoU The box score of the threshold remains unchanged
in consideration of NMS The problem of , Especially in dense scenes , So some improvements have been made .
NMS The score of the suppressed box will be set to 0, and soft NMS Think the box that is closer to the candidate box , The more likely it is ” False positive ”, The decay of the corresponding score should be more serious , So the score is attenuated , Attenuation mode :
Use 1-IoU The product of the score is the attenuated value :
When the overlap between adjacent detection frames and candidate frames exceeds the threshold N t N_t Nt when , Score linear decay , But this function is not a nonlinear function , Easy to mutate , So we need to find a continuous function , The score of boxes without overlap is not attenuated , The highly overlapping boxes are greatly attenuated , The high IoU There is a high penalty for , low IoU There is a low penalty for , And it is gradually transitional , So there is the second .Gaussian penalty function :
Var voting: Score correction (soft NMS )+ Position correction ( Assign a higher weight to the box close to the candidate box , Its uncertainty is more )
After predicting the variance of the position , According to the variance of these bbox Vote .
The predicted result is :
( x 1 , y 1 , x 2 , y 2 , s , σ x 1 , σ y 1 , σ x 2 , σ y 2 ) (x_1, y_1, x_2, y_2, s, \sigma_{x_1}, \sigma_{y_1}, \sigma_{x_2}, \sigma_{y_2}) (x1,y1,x2,y2,s,σx1,σy1,σx2,σy2)
Variance voting is based on pairs soft-NMS Processing modification of , It's done soft-NMS after , On the resulting border b m b_m bm Correct the variance based on the network learning .
The new coordinates are calculated as follows , x i x_i xi It's No i i i The coordinates of the two boxes :
First , Select the box with the highest classification score b b b
then , For all and b b b There is IoU Intersecting boxes ( I o U ( b i , b ) IoU(b_i, b) IoU(bi,b) >0), Calculate weights p i p_i pi
I o U ( b i , b ) IoU(b_i, b) IoU(bi,b) The bigger it is , be p i p_i pi The bigger it is , That is, the closer the two boxes are p i p_i pi The bigger the value is. , The greater the weight , That is to say, the closer the box is
Last , According to the weight , To update b b b Coordinates of ( The four coordinates are updated respectively )
The author gives a higher weight to the box that is closer to the true value but has a lower classification score .
among :
- σ t \sigma_t σt Is an adjustable parameter
There are two kinds of neighborhood boxes that will be reduced in weight :
- A frame with high square difference
- And the candidate box IoU Smaller frame
Classification scores do not participate in voting , Because a low classification score may have a high position score .
3、 ... and 、 effect
边栏推荐
- 【go】defer的使用
- 研究生新生培训第二周:卷积神经网络基础
- [convolution kernel design] scaling up your kernels to 31x31: revising large kernel design in CNN
- ASM插桩:学完ASM Tree api,再也不用怕hook了
- Detailed explanation of MySQL statistical function count
- 【bug】XLRDError: Excel xlsx file; not supported
- 【语义分割】语义分割综述
- 【bug】XLRDError: Excel xlsx file; not supported
- Flink, the mainstream real-time stream processing computing framework, is the first experience.
- 并发编程学习笔记 之 原子操作类AtomicReference、AtomicStampedReference详解
猜你喜欢
研究生新生培训第二周:卷积神经网络基础
【语义分割】Mapillary 数据集简介
【语义分割】Fully Attentional Network for Semantic Segmentation
Ribbon learning notes II
Flutter 绘制技巧探索:一起画箭头(技巧拓展)
这些你一定要知道的进程知识
Ffmpeg creation GIF expression pack tutorial is coming! Say thank you, brother black fly?
SSM integration
并发编程学习笔记 之 ReentrantLock实现原理的探究
研究生新生培训第三周:ResNet+ResNeXt
随机推荐
Lock lock of concurrent programming learning notes and its implementation basic usage of reentrantlock, reentrantreadwritelock and stampedlock
【Transformer】AdaViT: Adaptive Tokens for Efficient Vision Transformer
Detailed explanation of atomic operation class atomicinteger in learning notes of concurrent programming
My ideal job, the absolute freedom of coder farmers is the most important - the pursuit of entrepreneurship in the future
【语义分割】Fully Attentional Network for Semantic Segmentation
Flink connector Oracle CDC synchronizes data to MySQL in real time (oracle19c)
Windos下安装pyspider报错:Please specify --curl-dir=/path/to/built/libcurl解决办法
FFmpeg创作GIF表情包教程来了!赶紧说声多谢乌蝇哥?
A preliminary study on fastjason's autotype
[semantic segmentation] Introduction to mapillary dataset
GA-RPN:引导锚点的建议区域网络
【ML】机器学习模型之PMML--概述
【TensorRT】将 PyTorch 转化为可部署的 TensorRT
ANR优化:导致 OOM 崩溃及相对应的解决方案
[image classification] how to use mmclassification to train your classification model
这些你一定要知道的进程知识
【目标检测】KL-Loss:Bounding Box Regression with Uncertainty for Accurate Object Detection
nacos外置数据库的配置与使用
Detailed explanation of atomic operation classes atomicreference and atomicstampedreference in learning notes of concurrent programming
Ribbon学习笔记二