当前位置：网站首页>Learn CV two loss function from scratch (2)

Learn CV two loss function from scratch (2)

2022-07-08 02:19:00 【pogg_】

Insert picture description here
notes ： Most of the content of this blog is not original , But I sort out the data I collected before , And integrate them with their own stupid solutions , Convenient for review , All references have been cited , And has been praised and collected ~

Preface ： Continue to learn from scratch CV Part II loss function （1）

2.2.3 IoU Loss（Intersection-Over-Union, Intersection union ratio function ）

This method was proposed by Kuangshi , Published in 2016 ACM, Thesis link ：

https://arxiv.org/pdf/1608.01471.pdf

adopt 4 Independent regression of coordinate points Building boxes The shortcomings of ：

（1） The way of detection and evaluation is to use IoU, The actual regression coordinate box is to use 4 Coordinates , As shown in the figure below , It's not equivalent ;L1 perhaps L2 Loss Same box , Its IoU It's not the only one

（2） adopt 4 The way a point returns to the coordinate box is to assume 4 The two coordinate points are independent of each other , Without considering its relevance , actual 4 Coordinate points have certain correlation

（3） be based on L1 and L2 The distance of loss It is not invariant to scale
Insert picture description here
Based on this, this paper puts forward IoU Loss, It will be 4 Composed of points box Return as a whole ：

The red dot in the figure above indicates the target detection network structure Head Point on part （i,j）, The green box indicates Ground truth box ,
The blue box indicates Prediction Box of ,IoU
loss As defined above , Find out first 2 A box IoU, Then ask for another -ln(IoU), In fact, many are directly defined as IoU Loss = 1-IoU

Insert picture description here
Image explanation part ：

IoU = Green area /（ Blue area + Green area + Orange area ）

and IOU Loss It can be simply expressed as ：

$L_{I O U}=1-I o U$ or $L_{I O U}=-ln（I O U）$

2.2.4 GIoU Loss(Generalized Intersection over Union)

This method was proposed by Stanford scholars , Published in CVPR2019, Thesis link ：

https://arxiv.org/pdf/1902.09630.pdf

IoU Loss The shortcomings of ：

（1） When the prediction box and the target box do not intersect ,IoU(A,B)=0 when , Can't reflect A,B The distance , In this case, the loss function is not differentiable ,IoU Loss Can't optimize when two boxes don't intersect .

（2） Suppose that the size of the prediction box and the target box are determined , As long as the intersection value of the two boxes is certain , Its IoU When the values are the same ,IoU The value does not reflect how the two boxes intersect .

As shown in the figure above , Three boxes with different relative positions have the same IoU=0.33 value , But have different GIoU=0.33,0.24,-0.1. When the alignment direction of the box is better GIoU The value of will be higher .
Insert picture description here
The red box is A、B Outside rectangle

GIoU The implementation method is as follows , among C by A and B The outer rectangle of . use C subtract A and B The union of divided by C I get a number , Then use the box A and B Of IoU Subtract this value to get GIoU Value .

Image explanation part ：
Insert picture description here
stay IoU Find one on the basis of “ It has a rectangular frame ”C, This global box can just put two b-box Put in . So for a little more area C_.

According to the diagram above ： $GIOU = IOU - C_{-}/C$

GIOU loss It can be simply expressed as ：
Insert picture description here
namely ：

In two b-box Without intersection ：

You can see GIOU Meeting ** Changes with the distance between the two boxes ,** So we can see that loss On , Guide the direction of the prediction box .

GIoU The nature of ：

GIoU and IoU equally , It can be used as a measure of distance , $L_{GIoU}=1-GLoU$
GIoU Scale invariance
about $\forall A,B$ , Yes $GIoU\left( A,B \right)\leq IoU\left( A,B \right)$ And $0\leq IoU\left( A,B \right)\leq1$ , therefore $-1\leq GIoU\left( A,B \right)\leq1$ . When $A\rightarrow B$ when , Both are equal to 1, here GIoU be equal to 1, When
A and B When they don't intersect , $GIoU\left( A,B \right) = -1$

GIoU Loss Insufficient
Insert picture description here
When the target box completely wraps the prediction box ,IoU and GIoU The values are the same , here GIoU Degenerate to IoU, It is impossible to distinguish their relative position ; At this time, the author puts forward DIoU Because the normalized distance of the center point is added , Therefore, such problems can be better optimized .

Inspiration :

be based on IoU and GIoU The problem is , The author raises two questions ：

First of all ： Whether it is feasible to directly minimize the normalized distance between the prediction frame and the target frame , In order to achieve faster convergence speed .
second ： How to make regression more accurate when it overlaps or even contains the target box 、 faster .

There are three important geometric factors that should be considered in a good target box regression loss ： Overlap area , Distance from the center , Aspect ratio . Based on question one , The author puts forward DIoU Loss, be relative to GIoU Loss Faster convergence , The Loss Considering The overlap area and the distance from the center point , But the aspect ratio is not taken into account ; For question two , The author puts forward CIoU Loss, Its convergence accuracy is higher , All three factors have been taken into account .

2.2.5 DIoU Loss（Distance-IoU Loss）

This article is published in AAAI 2020, Thesis link ：

https://arxiv.org/pdf/1911.08287.pdf

Usually based on IoU-based Of loss Can be defined as $R\left( B,B^{gt} \right)$ , among $R\left( B,B^{gt} \right)$ Defined as the prediction box $B$ And target box $B^{gt}$ The penalty for .

DIoU The penalty item in is expressed as $R_{DIoU} =\frac{\rho^{2}\left( b,b^{gt} \right)}{c^{2}}$ , among $b and b^{gt}$ respectively $B$ and $B^{gt}$ The center of , $\rho\left( \cdot \right)$ It means Euclidean distance , $c$ Express $B$ and $B^{gt}$ The diagonal distance of the smallest outer rectangle , As shown in the figure below . Can be DIoU Replace IoU be used for NMS In the algorithm , That is, the paper puts forward DIoU-NMS, The experimental results show that there is a certain improvement .

DIoU Loss function Defined as ： $L_{DIoU} = 1- IoU +\frac{\rho^{2}\left( b,b^{gt} \right)}{c^{2}}$

Insert picture description here
The green box in the above figure is the target box , The black box is the prediction box , The gray box is the smallest outer rectangle of both ,d Represents the distance between the center point of the target frame and the real frame ,c Represents the distance of the smallest outer rectangle .

DIoU The nature of ：

Scale invariance When the two boxes completely coincide , $L_{IoU}=L_{GIoU}=L_{DIoU}=0$ , When 2 When two boxes do not intersect $L_{DIoU}\rightarrow 2$
DIoU Loss Can directly optimize 2 The direct distance between two boxes , Than GIoU Loss Faster convergence
For the case of the target box and the package prediction box ,DIoU Loss Can converge quickly , and GIoU Loss At this time, it degenerates into IoU Loss Slow convergence

2.2.6 CIoU Loss（Complete-IoU Loss）

CIoU The penalty is in DIoU An influence factor is added to the penalty term of $\alpha\upsilon$ , This factor takes into account the aspect ratio of the prediction frame and the aspect ratio of the fitting target frame ,CIoU Loss function For the definition of ：
Insert picture description here
among $\alpha$ It's used to do trade-off Parameters of

from $\alpha$ It can be seen that , When IOU Less than 0.5 When ,CIOU It becomes DIOU.IOU The bigger it is , $\alpha$ The closer the 1.

$\upsilon$ Is a parameter used to measure the consistency of aspect ratio , Defined as ：
Insert picture description here
that , stay IOU In a big way , $\frac{\rho^{2}\left(\boldsymbol{p}, \boldsymbol{p}^{g t}\right)}{c^{2}}$ Turn into 0（ The center points coincide ）, It's time to adjust the aspect ratio .DIOU At this time ,loss And the gradient becomes smaller （ Only by IOU loss It's part of the transfer gradient ）, and CIOU You can rely on the last one to keep loss Gradient of , So that the detector can quickly adjust itself and GT Frames have the same aspect ratio .
With a comparison chart to illustrate ：
Insert picture description here
The first row is GIOU, The second row is CIOU, The green box at the origin is GT box , The black box is anchor box , The red box is the prediction box . You can see , In the prediction box and GT Boxes don't intersect ( namely IOU=0) Under the circumstances ,GIOU and CIOU They all have the ability to guide the movement of the detection box . here ,GIOU From the position 、 Aspect ratio 、size Adjust the prediction box from the same angle , and CIOU It's a quick pull back ( Don't move the shape of the prediction box much ), therefore CIOU Comparable GIOU Pull back the prediction box faster to make it IOU>0. etc. IOU>0 in the future ,CIOU Rapid adjustment size scale . etc. IOU>0.5 in the future ,CIOU The width to length ratio of （ It's also called aspect ratio ） The part begins to be the main part of gradient propagation , Make the prediction box and GT The frame has the same aspect ratio .
Insert picture description here
This is a simulation comparison made by the author in the paper , It can be seen that ,CIoU Loss The effect is the best .