当前位置:网站首页>Loss function and positive and negative sample allocation: Yolo series
Loss function and positive and negative sample allocation: Yolo series
2022-07-02 15:29:00 【cartes1us】
Target detection algorithm is the most difficult to understand , The most complicated design is Positive and negative sample allocation and Loss function This one , These two will largely determine the training effect of the network , So the post is right yolo Make a summary of the series , The focus is also on these two , Absorb others' excellent blog content , To organize .

YOLO v1:
2016 CVPR There are two stars , One is resnet, The other is YOLO.
The above figure shows the network output , Notice that each grid inside predicts two Box, That is, in the paper B Set to 2, But only one group class probabilities, That is, each grid can only predict one object , Two Box To make everyone Box.
Loss in thesis :
Brother Zihao of Tongji Notes :
- Positive and negative sample allocation : Some grid cell The predicted results are consistent with gt All target calculations in IoU,IoU The highest object is regarded as this grid cell The goal that should be predicted , namely yolo Each grid cell Only responsible for predicting one object , Because only 7*7 individual grid cell, therefore yolo The prediction effect of densely arranged targets is poor .
- Loss balance coefficient : λ c o o r d \lambda_{coord} λcoord It's a positive sample BBox Weight coefficient of loss , In the paper, it is set to 5. Negative samples are not calculated BBox Loss , Only calculate whether there is a target obj Loss , Its coefficient λ n o o b j \lambda_{noobj} λnoobj Set to 0.5, The loss weight of the positive sample is set to 1, Positive sample obj Loss tag C ^ i \hat{C}_i C^i No 1, It is the difference between the current prediction result and the assigned positive sample IoU, For negative samples, this item is 0.
- x,y,w,h:w,h Normalized to relative input size 0,1 Between ,x,y Is relative to this grid cell Offset of position , Also in the 0~1 Within the scope of .
- Yes w,h The reason for the root sign : We are right. BBox The evaluation index of prediction is IoU, but IoU Phase at the same time , Large framed w、h The loss obtained must be higher than the small box. The open root sign can be considered to increase sum-squared error (L2 loss) Scale invariance of .
- Classification is also used L2 Loss : According to several third-party code implementations I investigated , This p i ( c ) p_i (c) pi(c) And C i C_i Ci I haven't done softmax( I didn't see it ) or sigmoid, The last layer of the network is the linear layer , and p ^ i ( c ) \hat{p}_i (c) p^i(c) The labels of positive samples and negative samples are 1 and 0, In this way, if initialization is not good , At the beginning of training, there may be a gradient explosion , I don't know if my understanding is correct .
YOLO v2:
comparison v1, stay backbone There's an update on , Joined the BN layer , And there is one more detection head passthrough layer Short circuit of , But the final detection head is still only one , And used anchor box Mechanism , Because prediction offset is better to learn than direct prediction , Anchor frame shape And size are based on k-means Clustering results show that 5 Seed anchor frame . The title of the thesis 9000 It means that you can use the method in the text to make yolov2 Realization 9000 Detection of kinds of targets .
BN layer :
Every grid cell( grid ) Can predict at most 5 An object ,5 Represents five different anchor:
Calculation method in prediction :
Among them sigmoid Function makes each cell The center of the prediction results of all fall in this cell in .
Loss function :
The above picture is based on yolov2 The loss function of code collation , In the formula b i j k o b_{ijk}^{o} bijko yes obj Degree of confidence , b i j k r b_{ijk}^{r} bijkr It's predicted BBox Location information , among r ∈ ( x , y , w , h ) , b i j k c r\in (x,y,w,h),b_{ijk}^{c} r∈(x,y,w,h),bijkc Is the category of the predicted box . p r i o r k r prior_{k}^{r} priorkr yes anchor The location of , Because of every cell Five of them anchor It's all the same , So the subscript only needs to know anchor Belong to these five anchor Which information of . t r u t h r truth^{r} truthr Is the location information of the callout box . t r u t h c truth^{c} truthc Is the category information of the callout box . I O U t r u t h k IOU_{truth}^{k} IOUtruthk yes anchor With the dimension box IOU.
loss It's made up of three parts , The part marked yellow is not 0 namely 1.
The first part is marked with yellow , Penalty for negative samples , Represents the prediction box and dimension box IOU If less than threshold 0.6, It is considered that the prediction box matches the negative sample , Value 1, Instead of 0, The latter part represents the label 0 And confidence b i j k o b_{ijk}^{o} bijko Of L2 Loss . This IoU The calculation of is based only on size and shape , It's not about location , That is, first put a grid cell Of anchor and gt(ground truth) The center position coincides , Calculate again IoU.
The second part is marked with yellow , It means to be in 12800 Before the next iteration , take anchor Add loss with position error of prediction frame , It can make the model learn to predict in the early stage anchor Location , Make the output t x , t y , t w , t h t^{x},t^{y},t^{w},t^{h} tx,ty,tw,th A more stable ;( My understanding of this item is not very clear )
The third part is marked with yellow , Indicates whether the prediction box is responsible for predicting objects , The anchor With the dimension box IoU The largest corresponding prediction box is responsible for predicting objects (IOU>0.6 But the non maximum prediction box is ignored ), The first item represents the positioning error between the prediction box and the dimension box , The second term represents the confidence of the prediction box and the sum of the dimension box anchor Of IoU The error of the , Indicates the error between the classification result of the prediction box and the category information of the annotation box , This prediction category results in all the codes I see softmax Of .
There are five more in the picture λ \lambda λ, Represents the balance coefficient of each loss .
YOLO v3:
边栏推荐
- 04. Some thoughts on enterprise application construction after entering cloud native
- Learn the method code of using PHP to realize the conversion of Gregorian calendar and lunar calendar
- Common English abbreviations for data analysis (I)
- N皇后问题的解决
- 13_Redis_事务
- Tidb data migration scenario overview
- 03_線性錶_鏈錶
- Bing. Site Internet
- 飞凌嵌入式RZ/G2L处理器核心板及开发板上手评测
- Libcurl Lesson 13 static library introduces OpenSSL compilation dependency
猜你喜欢

Application and practice of Jenkins pipeline

2022 年辽宁省大学生数学建模A、B、C题(相关论文及模型程序代码网盘下载)

Facing the challenge of "lack of core", how can Feiling provide a stable and strong guarantee for customers' production capacity?

21_Redis_浅析Redis缓存穿透和雪崩

LeetCode刷题——验证二叉树的前序序列化#331#Medium

19_Redis_宕机后手动配置主机

21_ Redis_ Analysis of redis cache penetration and avalanche

03_线性表_链表

How to choose a third-party software testing organization for automated acceptance testing of mobile applications
![[c voice] explain the advanced pointer and points for attention (2)](/img/fb/515e25899bd9a2905ee63cb041934a.png)
[c voice] explain the advanced pointer and points for attention (2)
随机推荐
10_Redis_geospatial_命令
Solution of Queen n problem
20_Redis_哨兵模式
How to solve the problem of database content output
微信支付宝账户体系和支付接口业务流程
vChain: Enabling Verifiable Boolean Range Queries over Blockchain Databases(sigmod‘2019)
LeetCode_ Sliding window_ Medium_ 395. Longest substring with at least k repeated characters
Facing the challenge of "lack of core", how can Feiling provide a stable and strong guarantee for customers' production capacity?
. Solution to the problem of Chinese garbled code when net core reads files
[C language] explain the initial and advanced levels of the pointer and points for attention (1)
怎样从微信返回的json字符串中截取某个key的值?
Redux——详解
Data analysis thinking analysis methods and business knowledge - business indicators
17_ Redis_ Redis publish subscription
Set set you don't know
Bing.com网站
YOLOV5 代码复现以及搭载服务器运行
Bing. Com website
There are 7 seats with great variety, Wuling Jiachen has outstanding product power, large humanized space, and the key price is really fragrant
How to find a sense of career direction