当前位置:网站首页>Learn CV two loss function from scratch (2)
Learn CV two loss function from scratch (2)
2022-07-08 02:19:00 【pogg_】
notes : Most of the content of this blog is not original , But I sort out the data I collected before , And integrate them with their own stupid solutions , Convenient for review , All references have been cited , And has been praised and collected ~
Preface : Continue to learn from scratch CV Part II loss function (1)
2.2.3 IoU Loss(Intersection-Over-Union, Intersection union ratio function )
This method was proposed by Kuangshi , Published in 2016 ACM, Thesis link :
https://arxiv.org/pdf/1608.01471.pdf
adopt 4 Independent regression of coordinate points Building boxes The shortcomings of :
(1) The way of detection and evaluation is to use IoU, The actual regression coordinate box is to use 4 Coordinates , As shown in the figure below , It's not equivalent ;L1 perhaps L2 Loss Same box , Its IoU It's not the only one
(2) adopt 4 The way a point returns to the coordinate box is to assume 4 The two coordinate points are independent of each other , Without considering its relevance , actual 4 Coordinate points have certain correlation
(3) be based on L1 and L2 The distance of loss It is not invariant to scale
Based on this, this paper puts forward IoU Loss, It will be 4 Composed of points box Return as a whole :
The red dot in the figure above indicates the target detection network structure Head Point on part (i,j), The green box indicates Ground truth box ,
The blue box indicates Prediction Box of ,IoU
loss As defined above , Find out first 2 A box IoU, Then ask for another -ln(IoU), In fact, many are directly defined as IoU Loss = 1-IoU
Image explanation part :
IoU = Green area /( Blue area + Green area + Orange area )
and IOU Loss It can be simply expressed as :
L I O U = 1 − I o U L_{I O U}=1-I o U LIOU=1−IoU or L I O U = − l n ( I O U ) L_{I O U}=-ln(I O U) LIOU=−ln(IOU)
2.2.4 GIoU Loss(Generalized Intersection over Union)
This method was proposed by Stanford scholars , Published in CVPR2019, Thesis link :
https://arxiv.org/pdf/1902.09630.pdf
IoU Loss The shortcomings of :
(1) When the prediction box and the target box do not intersect ,IoU(A,B)=0 when , Can't reflect A,B The distance , In this case, the loss function is not differentiable ,IoU Loss Can't optimize when two boxes don't intersect .
(2) Suppose that the size of the prediction box and the target box are determined , As long as the intersection value of the two boxes is certain , Its IoU When the values are the same ,IoU The value does not reflect how the two boxes intersect .
As shown in the figure above , Three boxes with different relative positions have the same IoU=0.33 value , But have different GIoU=0.33,0.24,-0.1. When the alignment direction of the box is better GIoU The value of will be higher .
The red box is A、B Outside rectangle
GIoU The implementation method is as follows , among C by A and B The outer rectangle of . use C subtract A and B The union of divided by C I get a number , Then use the box A and B Of IoU Subtract this value to get GIoU Value .
Image explanation part :
stay IoU Find one on the basis of “ It has a rectangular frame ”C, This global box can just put two b-box Put in . So for a little more area C_.
According to the diagram above : G I O U = I O U − C − / C GIOU = IOU - C_{-}/C GIOU=IOU−C−/C
GIOU loss It can be simply expressed as :
namely :
In two b-box Without intersection :
You can see GIOU Meeting ** Changes with the distance between the two boxes ,** So we can see that loss On , Guide the direction of the prediction box .
GIoU The nature of :
- GIoU and IoU equally , It can be used as a measure of distance , L G I o U = 1 − G L o U L_{GIoU}=1-GLoU LGIoU=1−GLoU
- GIoU Scale invariance
- about ∀ A , B \forall A,B ∀A,B , Yes G I o U ( A , B ) ≤ I o U ( A , B ) GIoU\left( A,B \right)\leq IoU\left( A,B \right) GIoU(A,B)≤IoU(A,B) And 0 ≤ I o U ( A , B ) ≤ 1 0\leq IoU\left( A,B \right)\leq1 0≤IoU(A,B)≤1 , therefore − 1 ≤ G I o U ( A , B ) ≤ 1 -1\leq GIoU\left( A,B \right)\leq1 −1≤GIoU(A,B)≤1 . When A → B A\rightarrow B A→B when , Both are equal to 1, here GIoU be equal to 1, When
A and B When they don't intersect , G I o U ( A , B ) = − 1 GIoU\left( A,B \right) = -1 GIoU(A,B)=−1
GIoU Loss Insufficient
When the target box completely wraps the prediction box ,IoU and GIoU The values are the same , here GIoU Degenerate to IoU, It is impossible to distinguish their relative position ; At this time, the author puts forward DIoU Because the normalized distance of the center point is added , Therefore, such problems can be better optimized .
Inspiration :
be based on IoU and GIoU The problem is , The author raises two questions :
- First of all : Whether it is feasible to directly minimize the normalized distance between the prediction frame and the target frame , In order to achieve faster convergence speed .
- second : How to make regression more accurate when it overlaps or even contains the target box 、 faster .
There are three important geometric factors that should be considered in a good target box regression loss : Overlap area , Distance from the center , Aspect ratio . Based on question one , The author puts forward DIoU Loss, be relative to GIoU Loss Faster convergence , The Loss Considering The overlap area and the distance from the center point , But the aspect ratio is not taken into account ; For question two , The author puts forward CIoU Loss, Its convergence accuracy is higher , All three factors have been taken into account .
2.2.5 DIoU Loss(Distance-IoU Loss)
This article is published in AAAI 2020, Thesis link :
https://arxiv.org/pdf/1911.08287.pdf
Usually based on IoU-based Of loss Can be defined as L = 1 − I o U + R ( B , B g t ) L = 1- IoU + R\left( B,B^{gt} \right) L=1−IoU+R(B,Bgt), among R ( B , B g t ) R\left( B,B^{gt} \right) R(B,Bgt) Defined as the prediction box B B B And target box B g t B^{gt} Bgt The penalty for .
DIoU The penalty item in is expressed as R D I o U = ρ 2 ( b , b g t ) c 2 R_{DIoU} =\frac{\rho^{2}\left( b,b^{gt} \right)}{c^{2}} RDIoU=c2ρ2(b,bgt) , among b and b g t b and b^{gt} b and bgt respectively B B B and B g t B^{gt} Bgt The center of , ρ ( ⋅ ) \rho\left( \cdot \right) ρ(⋅) It means Euclidean distance , c c c Express B B B and B g t B^{gt} Bgt The diagonal distance of the smallest outer rectangle , As shown in the figure below . Can be DIoU Replace IoU be used for NMS In the algorithm , That is, the paper puts forward DIoU-NMS, The experimental results show that there is a certain improvement .
DIoU Loss function Defined as : L D I o U = 1 − I o U + ρ 2 ( b , b g t ) c 2 L_{DIoU} = 1- IoU +\frac{\rho^{2}\left( b,b^{gt} \right)}{c^{2}} LDIoU=1−IoU+c2ρ2(b,bgt)
The green box in the above figure is the target box , The black box is the prediction box , The gray box is the smallest outer rectangle of both ,d Represents the distance between the center point of the target frame and the real frame ,c Represents the distance of the smallest outer rectangle .
DIoU The nature of :
- Scale invariance When the two boxes completely coincide , L I o U = L G I o U = L D I o U = 0 L_{IoU}=L_{GIoU}=L_{DIoU}=0 LIoU=LGIoU=LDIoU=0 , When 2 When two boxes do not intersect L D I o U → 2 L_{DIoU}\rightarrow 2 LDIoU→2
- DIoU Loss Can directly optimize 2 The direct distance between two boxes , Than GIoU Loss Faster convergence
- For the case of the target box and the package prediction box ,DIoU Loss Can converge quickly , and GIoU Loss At this time, it degenerates into IoU Loss Slow convergence
2.2.6 CIoU Loss(Complete-IoU Loss)
CIoU The penalty is in DIoU An influence factor is added to the penalty term of α υ \alpha\upsilon αυ , This factor takes into account the aspect ratio of the prediction frame and the aspect ratio of the fitting target frame ,CIoU Loss function For the definition of :
among α \alpha α It's used to do trade-off Parameters of
from α \alpha α It can be seen that , When IOU Less than 0.5 When ,CIOU It becomes DIOU.IOU The bigger it is , α \alpha α The closer the 1.
υ \upsilon υ Is a parameter used to measure the consistency of aspect ratio , Defined as :
that , stay IOU In a big way , ρ 2 ( p , p g t ) c 2 \frac{\rho^{2}\left(\boldsymbol{p}, \boldsymbol{p}^{g t}\right)}{c^{2}} c2ρ2(p,pgt) Turn into 0( The center points coincide ), It's time to adjust the aspect ratio .DIOU At this time ,loss And the gradient becomes smaller ( Only by IOU loss It's part of the transfer gradient ), and CIOU You can rely on the last one to keep loss Gradient of , So that the detector can quickly adjust itself and GT Frames have the same aspect ratio .
With a comparison chart to illustrate :
The first row is GIOU, The second row is CIOU, The green box at the origin is GT box , The black box is anchor box , The red box is the prediction box . You can see , In the prediction box and GT Boxes don't intersect ( namely IOU=0) Under the circumstances ,GIOU and CIOU They all have the ability to guide the movement of the detection box . here ,GIOU From the position 、 Aspect ratio 、size Adjust the prediction box from the same angle , and CIOU It's a quick pull back ( Don't move the shape of the prediction box much ), therefore CIOU Comparable GIOU Pull back the prediction box faster to make it IOU>0. etc. IOU>0 in the future ,CIOU Rapid adjustment size scale . etc. IOU>0.5 in the future ,CIOU The width to length ratio of ( It's also called aspect ratio ) The part begins to be the main part of gradient propagation , Make the prediction box and GT The frame has the same aspect ratio .
This is a simulation comparison made by the author in the paper , It can be seen that ,CIoU Loss The effect is the best .
Reference resources
[1] https://mp.weixin.qq.com/s/ZbryNlV3EnODofKs2d01RA
[2] https://blog.csdn.net/leviopku/article/details/114655338?spm=1001.2014.3001.5501
[3] https://blog.csdn.net/u013069552/article/details/113804323?utm_source=app&app_version=4.9.0&code=app_1562916241&uLinkId=usr1mkqgl919blen
边栏推荐
- [knowledge map] interpretable recommendation based on knowledge map through deep reinforcement learning
- Semantic segmentation | learning record (5) FCN network structure officially implemented by pytoch
- 力扣5_876. 链表的中间结点
- EMQX 5.0 发布:单集群支持 1 亿 MQTT 连接的开源物联网消息服务器
- image enhancement
- Flutter 3.0框架下的小程序运行
- "Hands on learning in depth" Chapter 2 - preparatory knowledge_ 2.2 data preprocessing_ Learning thinking and exercise answers
- Mqtt x newsletter 2022-06 | v1.8.0 release, new mqtt CLI and mqtt websocket tools
- JVM memory and garbage collection-3-direct memory
- 阿南的判断
猜你喜欢
JVM memory and garbage collection -4-string
Wechat applet uniapp page cannot jump: "navigateto:fail can not navigateto a tabbar page“
喜欢测特曼的阿洛
Semantic segmentation | learning record (1) semantic segmentation Preface
CBAM for in-depth understanding of the attention mechanism in CV
Exit of processes and threads
很多小夥伴不太了解ORM框架的底層原理,這不,冰河帶你10分鐘手擼一個極簡版ORM框架(趕快收藏吧)
A comprehensive and detailed explanation of static routing configuration, a quick start guide to static routing
The bank needs to build the middle office capability of the intelligent customer service module to drive the upgrade of the whole scene intelligent customer service
OpenGL/WebGL着色器开发入门指南
随机推荐
Semantic segmentation | learning record (1) semantic segmentation Preface
生命的高度
Clickhouse principle analysis and application practice "reading notes (8)
The way fish and shrimp go
th:include的使用
Talk about the cloud deployment of local projects created by SAP IRPA studio
Emqx 5.0 release: open source Internet of things message server with single cluster supporting 100million mqtt connections
Force buckle 5_ 876. Intermediate node of linked list
Introduction to grpc for cloud native application development
Ml self realization /knn/ classification / weightlessness
JVM memory and garbage collection-3-object instantiation and memory layout
2022年5月互联网医疗领域月度观察
Industrial Development and technological realization of vr/ar
Disk rust -- add a log to the program
Learn face detection from scratch: retinaface (including magic modified ghostnet+mbv2)
Why did MySQL query not go to the index? This article will give you a comprehensive analysis
Kwai applet guaranteed payment PHP source code packaging
力争做到国内赛事应办尽办,国家体育总局明确安全有序恢复线下体育赛事
Leetcode question brushing record | 485_ Maximum number of consecutive ones
Many friends don't know the underlying principle of ORM framework very well. No, glacier will take you 10 minutes to hand roll a minimalist ORM framework (collect it quickly)