当前位置:网站首页>Yolov1 learning notes
Yolov1 learning notes
2022-07-03 06:25:00 【Happy breeder】
Preface : Recently I read a lot about yolo The article , In fact, many articles are really good , Basically, Xiaobai can understand what he wrote , But if you watch too much, you will find , If you want to understand some contents in depth, you must also look at the original English paper , People will add their subjective understanding to what they write more or less , So if you have time, you'd better take a look at the original paper , Understanding the principle of algorithm can make you more confident when doing engineering projects , Higher R & D efficiency .
yolov1 The paper is 2016 year 9 Released on , The inventor and author was a graduate student at that time , I have to lament his talent . And the author will open source all the code , I also admire the pattern of the author , At first, I thought that the developers of dark net and yolo The designer of is not the same , Now I found the framework and yolo The author is the same person , Not only does the algorithm do well , Even the level of software design is so high , pure c Develop dark network framework , Yes, yes .
Catalog
1. Whole graph regression logic
3. Some operating instructions
The author in yolov1 This paper introduces ,yolo Compared with other target detection, the optimization is mainly manifested in the following three points :
(1) The problem of target detection is regarded as a regression problem of detection ;
This is a yolo The main reason for the speed ;
(2)yolo The algorithm takes the whole graph as an input when predicting , Different from sliding window object detection and R-CNN;
This is the main reason to improve the accuracy of target detection ;
(3) Good versatility ;
This feeling can't be an advantage , current faster R-cnn And SSD Should also have ;
1. Whole graph regression logic

The picture above is from yolo Pictures in the original paper , The author divides the whole image into S*S Grid (s) , Each grid predicts B The rectangle of , And every such rectangle should be predicted C Class target . Then the probability of a rectangular box predicting that the rectangular box contains a target can be expressed in the following form :

IOU(intersection over union) That is to say, cross and compare , The intersection ratio between the prediction box and the real box . For the sake of understanding , The explanation of the whole graph regression in the original paper is posted .

That is, the regression of a whole graph needs to be calculated tensor number , Suppose the whole graph is divided into 7*7 Grid size of , And each grid needs to predict two rectangular boxes , Each rectangular box needs 5 The parameters represent , Respectively x,y,w,h And confidence .(x,y) It's coordinates , Is the central coordinate of the rectangular box , The coordinates here are those relative to the grid , Coordinates with grid as reference , Rather than the coordinates of the whole as a reference , The width and height are relative to the prediction of the whole picture . So in the end , The output of the whole network tensor yes 7*7*30, Then, the probability and coordinate position of the final class are output by the full connection layer . Whole yolo The network is shown in the figure below , It includes 24 Two convolution layers and two fully connected layers .

In the paper, the author also mentioned that it can further speed up the reasoning , Become fast yolo, That is what we see in the dark net framework later yolo-tiny The Internet , It's just 9 Convolution layers .
2. Loss function
Loss function design , The author of the object in the grid (grid cell) Neutralization does not set different weights in the mesh , When the target is in the grid , Great power , Set to 5, When not in the mesh, the weight is small , Set to 0.5, This is conducive to the stability of the model , Otherwise, if both are treated equally , Set the same weight , So when the target is not in the grid , At this time, the confidence is 0, If the proportion of the weight is set to be large , It can easily lead to the instability of the model .

among ,
=5,
=0.5
3. Some operating instructions
3.1 Pre training model :
Some articles have shown that the muscle augmenting convolution layer and the whole continuous layer can improve the performance of the model , The author uses three kinds of former 20 A convolution layer is used to train the pre training model , The following four convoluted layers and two fully connected layers are initialized randomly .
3.2 leaky Operation function

Well, that's all for today's study , If you have new knowledge, please update it .
Murmur murmur : If you can't describe a thing in simple language , That means you don't understand .
边栏推荐
- Printer related problem record
- Virtual memory technology sharing
- 【C#/VB.NET】 将PDF转为SVG/Image, SVG/Image转PDF
- 使用 Abp.Zero 搭建第三方登录模块(一):原理篇
- Install VM tools
- Introduction to software engineering
- Une exploration intéressante de l'interaction souris - pointeur
- Derivation of variance iteration formula
- Learning notes -- principles and comparison of k-d tree and IKD tree
- Kubernetes notes (III) controller
猜你喜欢

Kubernetes notes (VIII) kubernetes security

Read blog type data from mysql, Chinese garbled code - solved

Project summary --04

Merge and migrate data from small data volume, sub database and sub table Mysql to tidb

Fluentd facile à utiliser avec le marché des plug - ins rainbond pour une collecte de journaux plus rapide

YOLOV3学习笔记

10万奖金被瓜分,快来认识这位上榜者里的“乘风破浪的姐姐”

Chapter 8. MapReduce production experience

Kubernetes notes (III) controller

23 design models
随机推荐
Leetcode solution - 02 Add Two Numbers
Difference between shortest path and minimum spanning tree
[system design] proximity service
Naive Bayes in machine learning
What's the difference between using the Service Worker Cache API and regular browser cache?
Use abp Zero builds a third-party login module (I): Principles
Support vector machine for machine learning
In depth learning
Selenium - 改变窗口大小,不同机型呈现的宽高长度会不一样
After the Chrome browser is updated, lodop printing cannot be called
Kubesphere - Multi tenant management
pytorch练习小项目
SSH link remote server and local display of remote graphical interface
Various usages of MySQL backup database to create table select and how many days are left
技术管理进阶——你了解成长的全貌吗?
phpstudy设置项目可以由局域网的其他电脑可以访问
【系统设计】邻近服务
Install VM tools
About the difference between count (1), count (*), and count (column name)
Time format record