当前位置：网站首页>Yolov2 learning and summary

Yolov2 learning and summary

2022-07-03 06:25:00 【Happy breeder】

Preface ：《YOLOV1 Study and summarize 》 I learned from the last article yolov1, It's very rewarding , got it yolov1 Why is the recall rate low , Why the accuracy is low and so on .yolov2 It can be said that the accuracy and speed are solved yolov1 Some obvious problems in . This article interprets the great God's thesis with you .

Catalog

1. Accuracy improvement

1.1 BN（Batch Normalization）

1.2 High resolution training

1.3 introduce Anchor Mechanism

1.4 Scale clustering

1.5 Location prediction

1.6 Multiscale training

2. Speed up

2.1 Put forward based on darknet19 Feature extraction based on Network

1. Accuracy improvement

1.1 BN（Batch Normalization）

BN Operation has the following advantages , The author will BN Operations are added to each convolution layer , Progressiveness before convolution operation BN operation , This operation makes YOLO Of mAP Promoted 2%, And it can also prevent over fitting of training .

1.2 High resolution training

Improving the input resolution can certainly improve the detection accuracy , But at the same time, the speed also decreased .

1.3 introduce Anchor Mechanism

YOLO in , All target detection bounding boxes （bounding box） After the feature extraction network, the full connected layer is used to directly predict the coordinates ,faster R-CNN yes RPN Network to select anchor boxes,YOLOV2 We use anchor boxes To replace the full company layer .

stay 1.2 High resolution setting of , We set the resolution of the network input image to 448*448, In order to obtain the odd position feature map , We changed the input resolution to 416*416, Why? 416*416 That's the number , Because we calculate the number of pooling layers in the feature extraction network , The number of downsampling from input to output is 32, That is, the input must be 32 Integer multiple , This is the size of the output feature map 13*13.

adopt anchor Mechanism , We need to predict more than 1000 box, and YOLO You only need to predict 98 individual box. How to calculate here ？ Last article 《YOLOV1 Study and summarize 》 We learned every one in Grid cell Need to predict two box, Divide the whole image into 7*7 After grid , The total box Namely 7*7*2 Of box. adopt anchor The mechanism will reduce the recall rate from 81% Promoted to 88%, It can be said to be a huge leap .

1.4 Scale clustering

When we want to use anchor Mechanism , There must be two problems ,anchor How to choose ？ Choose a few ？ Manual selection , This increases the workload of model training . Although through a large number of iterations , default anchor Value can also meet the needs of the task , But if we choose the right anchor value , It is certain to improve the efficiency and accuracy of training . In order to calculate automatically Anchor, We use clustering algorithm K-means, Through multiple K Value testing , Found that when K be equal to 5 when , A good balance can be achieved between computational complexity and recall .

stay anchor When the quantity is determined , Several numerical values are also tested by clustering algorithm , Found that when anchor The value is 9 On average IOU Value ratio anchor=5 Time is much higher ,so take anchor The number of is set to 9.

1.5 Location prediction

The author first explains RPN Network prediction box The reason for the instability , stay RPN In the network , The calculation formula of the central coordinate of the prediction box is as follows ：

（x,y） Predictive box Central coordinates ,tx,ty For horizontal and vertical offset parameters ,wa and ha by anchor Frame and height of ,（xa,ya） by anchor The central coordinates of . there tx and ty There's no scope , When tx=-1 when , The central coordinate of the prediction box becomes negative . Therefore, using this mechanism may lead to model instability , This requires the offset value tx,ty To limit .

YOLOV2 in , We use the activation function to limit the offset value , Yes, the offset value is limited to 0 To 1 Between , This will ensure the stability of the model .YOLOV2 The target detection bounding box of is calculated as follows ：、

among , $\sigma$ Is the activation function , $c_{x}$ , $c_{y}$ by grid cell The offset value relative to the upper left corner coordinate of the whole image , $p_{w}$ , $p_{h}$ by anchor The width and height of the frame , $b_{w}$ , $b_{h}$ Is the width and height of the prediction box , $b_{x}$ , $b_{y}$ Is the central coordinate of the prediction box , The diagram of each parameter is shown in the figure below ：

1.6 Multiscale training

Multi scale training mechanism , In the process of training , Change the input size after a certain number of iterations , Each pass 10 The second iteration randomly selects an input size for training , Because the input must be 32 Multiple , Therefore, the size entered is （320,352,384...608） Choose from these numbers . Although the input accuracy of low resolution is poor , But there is faster training and reasoning , High resolution input has better accuracy , Such a mechanism makes YOLOV2 Better balance between accuracy and speed .

2. Speed up

2.1 Put forward based on darknet19 Feature extraction based on Network

Used most 3*3 Convolution kernel operation to extract features , The network architecture is as follows ：

Compared with VGG16, The calculation amount of one reasoning operation is only 50.58 Billion times floating point , and VGG16 The amount of calculation is 300 Billion times floating point .

原网站

版权声明
本文为[Happy breeder]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/184/202207030605338721.html