当前位置：网站首页>Yolov1 learning notes

Yolov1 learning notes

2022-07-03 06:25:00 【Happy breeder】

Preface ： Recently I read a lot about yolo The article , In fact, many articles are really good , Basically, Xiaobai can understand what he wrote , But if you watch too much, you will find , If you want to understand some contents in depth, you must also look at the original English paper , People will add their subjective understanding to what they write more or less , So if you have time, you'd better take a look at the original paper , Understanding the principle of algorithm can make you more confident when doing engineering projects , Higher R & D efficiency .

yolov1 The paper is 2016 year 9 Released on , The inventor and author was a graduate student at that time , I have to lament his talent . And the author will open source all the code , I also admire the pattern of the author , At first, I thought that the developers of dark net and yolo The designer of is not the same , Now I found the framework and yolo The author is the same person , Not only does the algorithm do well , Even the level of software design is so high , pure c Develop dark network framework , Yes, yes .

Catalog

1. Whole graph regression logic

2. Loss function

3. Some operating instructions

3.1 Pre training model ：

3.2 leaky Operation function

The author in yolov1 This paper introduces ,yolo Compared with other target detection, the optimization is mainly manifested in the following three points ：

（1） The problem of target detection is regarded as a regression problem of detection ;

This is a yolo The main reason for the speed ;

（2）yolo The algorithm takes the whole graph as an input when predicting , Different from sliding window object detection and R-CNN;

This is the main reason to improve the accuracy of target detection ;

（3） Good versatility ;

This feeling can't be an advantage , current faster R-cnn And SSD Should also have ;

1. Whole graph regression logic

The picture above is from yolo Pictures in the original paper , The author divides the whole image into S*S Grid (s) , Each grid predicts B The rectangle of , And every such rectangle should be predicted C Class target . Then the probability of a rectangular box predicting that the rectangular box contains a target can be expressed in the following form ：

IOU(intersection over union) That is to say, cross and compare , The intersection ratio between the prediction box and the real box . For the sake of understanding , The explanation of the whole graph regression in the original paper is posted .

That is, the regression of a whole graph needs to be calculated tensor number , Suppose the whole graph is divided into 7*7 Grid size of , And each grid needs to predict two rectangular boxes , Each rectangular box needs 5 The parameters represent , Respectively x,y,w,h And confidence .（x,y） It's coordinates , Is the central coordinate of the rectangular box , The coordinates here are those relative to the grid , Coordinates with grid as reference , Rather than the coordinates of the whole as a reference , The width and height are relative to the prediction of the whole picture . So in the end , The output of the whole network tensor yes 7*7*30, Then, the probability and coordinate position of the final class are output by the full connection layer . Whole yolo The network is shown in the figure below , It includes 24 Two convolution layers and two fully connected layers .

In the paper, the author also mentioned that it can further speed up the reasoning , Become fast yolo, That is what we see in the dark net framework later yolo-tiny The Internet , It's just 9 Convolution layers .

2. Loss function

Loss function design , The author of the object in the grid （grid cell） Neutralization does not set different weights in the mesh , When the target is in the grid , Great power , Set to 5, When not in the mesh, the weight is small , Set to 0.5, This is conducive to the stability of the model , Otherwise, if both are treated equally , Set the same weight , So when the target is not in the grid , At this time, the confidence is 0, If the proportion of the weight is set to be large , It can easily lead to the instability of the model .

among , $\lambda _{coord}$ =5, $\lambda _{noobj}$ =0.5

3. Some operating instructions

3.1 Pre training model ：

Some articles have shown that the muscle augmenting convolution layer and the whole continuous layer can improve the performance of the model , The author uses three kinds of former 20 A convolution layer is used to train the pre training model , The following four convoluted layers and two fully connected layers are initialized randomly .