当前位置:网站首页>Yolov1 learning notes
Yolov1 learning notes
2022-07-03 06:25:00 【Happy breeder】
Preface : Recently I read a lot about yolo The article , In fact, many articles are really good , Basically, Xiaobai can understand what he wrote , But if you watch too much, you will find , If you want to understand some contents in depth, you must also look at the original English paper , People will add their subjective understanding to what they write more or less , So if you have time, you'd better take a look at the original paper , Understanding the principle of algorithm can make you more confident when doing engineering projects , Higher R & D efficiency .
yolov1 The paper is 2016 year 9 Released on , The inventor and author was a graduate student at that time , I have to lament his talent . And the author will open source all the code , I also admire the pattern of the author , At first, I thought that the developers of dark net and yolo The designer of is not the same , Now I found the framework and yolo The author is the same person , Not only does the algorithm do well , Even the level of software design is so high , pure c Develop dark network framework , Yes, yes .
Catalog
1. Whole graph regression logic
3. Some operating instructions
The author in yolov1 This paper introduces ,yolo Compared with other target detection, the optimization is mainly manifested in the following three points :
(1) The problem of target detection is regarded as a regression problem of detection ;
This is a yolo The main reason for the speed ;
(2)yolo The algorithm takes the whole graph as an input when predicting , Different from sliding window object detection and R-CNN;
This is the main reason to improve the accuracy of target detection ;
(3) Good versatility ;
This feeling can't be an advantage , current faster R-cnn And SSD Should also have ;
1. Whole graph regression logic

The picture above is from yolo Pictures in the original paper , The author divides the whole image into S*S Grid (s) , Each grid predicts B The rectangle of , And every such rectangle should be predicted C Class target . Then the probability of a rectangular box predicting that the rectangular box contains a target can be expressed in the following form :

IOU(intersection over union) That is to say, cross and compare , The intersection ratio between the prediction box and the real box . For the sake of understanding , The explanation of the whole graph regression in the original paper is posted .

That is, the regression of a whole graph needs to be calculated tensor number , Suppose the whole graph is divided into 7*7 Grid size of , And each grid needs to predict two rectangular boxes , Each rectangular box needs 5 The parameters represent , Respectively x,y,w,h And confidence .(x,y) It's coordinates , Is the central coordinate of the rectangular box , The coordinates here are those relative to the grid , Coordinates with grid as reference , Rather than the coordinates of the whole as a reference , The width and height are relative to the prediction of the whole picture . So in the end , The output of the whole network tensor yes 7*7*30, Then, the probability and coordinate position of the final class are output by the full connection layer . Whole yolo The network is shown in the figure below , It includes 24 Two convolution layers and two fully connected layers .

In the paper, the author also mentioned that it can further speed up the reasoning , Become fast yolo, That is what we see in the dark net framework later yolo-tiny The Internet , It's just 9 Convolution layers .
2. Loss function
Loss function design , The author of the object in the grid (grid cell) Neutralization does not set different weights in the mesh , When the target is in the grid , Great power , Set to 5, When not in the mesh, the weight is small , Set to 0.5, This is conducive to the stability of the model , Otherwise, if both are treated equally , Set the same weight , So when the target is not in the grid , At this time, the confidence is 0, If the proportion of the weight is set to be large , It can easily lead to the instability of the model .

among ,
=5,
=0.5
3. Some operating instructions
3.1 Pre training model :
Some articles have shown that the muscle augmenting convolution layer and the whole continuous layer can improve the performance of the model , The author uses three kinds of former 20 A convolution layer is used to train the pre training model , The following four convoluted layers and two fully connected layers are initialized randomly .
3.2 leaky Operation function

Well, that's all for today's study , If you have new knowledge, please update it .
Murmur murmur : If you can't describe a thing in simple language , That means you don't understand .
边栏推荐
- Naive Bayes in machine learning
- [system design] proximity service
- 剖析虚幻渲染体系(16)- 图形驱动的秘密
- Selenium - 改变窗口大小,不同机型呈现的宽高长度会不一样
- The most classic 100 sentences in the world famous works
- Decision tree of machine learning
- 深入解析kubernetes controller-runtime
- Phpstudy setting items can be accessed by other computers on the LAN
- Push box games C #
- Creating postgre enterprise database by ArcGIS
猜你喜欢

【5G NR】UE注册流程

Kubesphere - Multi tenant management

Cesium Click to obtain the longitude and latitude elevation coordinates (3D coordinates) of the model surface

SVN分支管理

Fluentd is easy to use. Combined with the rainbow plug-in market, log collection is faster

Numerical method for solving optimal control problem (I) -- gradient method

IE browser flash back, automatically open edge browser

SSH link remote server and local display of remote graphical interface

Interesting research on mouse pointer interaction

After the Chrome browser is updated, lodop printing cannot be called
随机推荐
Numerical method for solving optimal control problem (I) -- gradient method
方差迭代公式推导
Pdf files can only print out the first page
When PHP uses env to obtain file parameters, it gets strings
学习笔记 -- k-d tree 和 ikd-Tree 原理及对比
轻松上手Fluentd,结合 Rainbond 插件市场,日志收集更快捷
Kubesphere - build MySQL master-slave replication structure
opencv鼠标键盘事件
Difference between shortest path and minimum spanning tree
opencv
Method of converting GPS coordinates to Baidu map coordinates
Zhiniu stock -- 03
UNI-APP中条件注释 实现跨段兼容、导航跳转 和 传参、组件创建使用和生命周期函数
技术管理进阶——你了解成长的全貌吗?
远端rostopic的本地rviz调用及显示
pytorch练习小项目
Merge and migrate data from small data volume, sub database and sub table Mysql to tidb
10万奖金被瓜分,快来认识这位上榜者里的“乘风破浪的姐姐”
Derivation of variance iteration formula
Cannot get value with @value, null