当前位置：网站首页>Loss function and positive and negative sample allocation: Yolo series

Loss function and positive and negative sample allocation: Yolo series

2022-07-02 15:29:00 【cartes1us】

Target detection algorithm is the most difficult to understand , The most complicated design is Positive and negative sample allocation and Loss function This one , These two will largely determine the training effect of the network , So the post is right yolo Make a summary of the series , The focus is also on these two , Absorb others' excellent blog content , To organize .

Insert picture description here

YOLO v1：

2016 CVPR There are two stars , One is resnet, The other is YOLO.
Insert picture description here
The above figure shows the network output , Notice that each grid inside predicts two Box, That is, in the paper B Set to 2, But only one group class probabilities, That is, each grid can only predict one object , Two Box To make everyone Box.

Loss in thesis ：
Insert picture description here
Brother Zihao of Tongji Notes ：

Positive and negative sample allocation ： Some grid cell The predicted results are consistent with gt All target calculations in IoU,IoU The highest object is regarded as this grid cell The goal that should be predicted , namely yolo Each grid cell Only responsible for predicting one object , Because only 7*7 individual grid cell, therefore yolo The prediction effect of densely arranged targets is poor .
Loss balance coefficient ： $\lambda_{coord}$ It's a positive sample BBox Weight coefficient of loss , In the paper, it is set to 5. Negative samples are not calculated BBox Loss , Only calculate whether there is a target obj Loss , Its coefficient $\lambda_{noobj}$ Set to 0.5, The loss weight of the positive sample is set to 1, Positive sample obj Loss tag $\hat{C}_i$ No 1, It is the difference between the current prediction result and the assigned positive sample IoU, For negative samples, this item is 0.
x,y,w,h:w,h Normalized to relative input size 0,1 Between ,x,y Is relative to this grid cell Offset of position , Also in the 0~1 Within the scope of .
Yes w,h The reason for the root sign ： We are right. BBox The evaluation index of prediction is IoU, but IoU Phase at the same time , Large framed w、h The loss obtained must be higher than the small box. The open root sign can be considered to increase sum-squared error (L2 loss) Scale invariance of .
Classification is also used L2 Loss ： According to several third-party code implementations I investigated , This $p_i (c)$ And $C_i$ I haven't done softmax（ I didn't see it ） or sigmoid, The last layer of the network is the linear layer , and $\hat{p}_i (c)$ The labels of positive samples and negative samples are 1 and 0, In this way, if initialization is not good , At the beginning of training, there may be a gradient explosion , I don't know if my understanding is correct .

YOLO v2：

comparison v1, stay backbone There's an update on , Joined the BN layer , And there is one more detection head passthrough layer Short circuit of , But the final detection head is still only one , And used anchor box Mechanism , Because prediction offset is better to learn than direct prediction , Anchor frame shape And size are based on k-means Clustering results show that 5 Seed anchor frame . The title of the thesis 9000 It means that you can use the method in the text to make yolov2 Realization 9000 Detection of kinds of targets .
BN layer ：
Insert picture description here
Every grid cell（ grid ） Can predict at most 5 An object ,5 Represents five different anchor：

Calculation method in prediction ：
Among them sigmoid Function makes each cell The center of the prediction results of all fall in this cell in .
Loss function ：
Insert picture description here
The above picture is based on yolov2 The loss function of code collation , In the formula $b_{ijk}^{o}$ yes obj Degree of confidence , $b_{ijk}^{r}$ It's predicted BBox Location information , among $r\in (x,y,w,h),b_{ijk}^{c}$ Is the category of the predicted box . $prior_{k}^{r}$ yes anchor The location of , Because of every cell Five of them anchor It's all the same , So the subscript only needs to know anchor Belong to these five anchor Which information of . $truth^{r}$ Is the location information of the callout box . $truth^{c}$ Is the category information of the callout box . $IOU_{truth}^{k}$ yes anchor With the dimension box IOU.

loss It's made up of three parts , The part marked yellow is not 0 namely 1.

The first part is marked with yellow , Penalty for negative samples , Represents the prediction box and dimension box IOU If less than threshold 0.6, It is considered that the prediction box matches the negative sample , Value 1, Instead of 0, The latter part represents the label 0 And confidence $b_{ijk}^{o}$ Of L2 Loss . This IoU The calculation of is based only on size and shape , It's not about location , That is, first put a grid cell Of anchor and gt(ground truth) The center position coincides , Calculate again IoU.

The second part is marked with yellow , It means to be in 12800 Before the next iteration , take anchor Add loss with position error of prediction frame , It can make the model learn to predict in the early stage anchor Location , Make the output $t^{x},t^{y},t^{w},t^{h}$ A more stable ;（ My understanding of this item is not very clear ）

The third part is marked with yellow , Indicates whether the prediction box is responsible for predicting objects , The anchor With the dimension box IoU The largest corresponding prediction box is responsible for predicting objects （IOU>0.6 But the non maximum prediction box is ignored ）, The first item represents the positioning error between the prediction box and the dimension box , The second term represents the confidence of the prediction box and the sum of the dimension box anchor Of IoU The error of the , Indicates the error between the classification result of the prediction box and the category information of the annotation box , This prediction category results in all the codes I see softmax Of .

There are five more in the picture $\lambda$ , Represents the balance coefficient of each loss .