当前位置:网站首页>Reading notes: you only look once:unified, real time object detection
Reading notes: you only look once:unified, real time object detection
2022-07-24 19:17:00 【to be__】
One 、Abstract
Consider target detection as a regression problem , Use a single network to predict directly from a picture bounding box And class probability ,YOLO There will be more positioning errors, but very fast .
Two 、Introduction
At present, detection systems use classifiers to perform detection , In order to detect the target , These systems use classifiers for targets and use various positions and sizes on a test image to evaluate . similar DPM The system uses the sliding window method , That is, the classifier moves a certain spatial location on the whole image on average .
At present, it is more similar to R-CNN The method uses the region candidate method , First, generate a potential bounding box, Then run a classifier in these candidate boxes , In the post-processing process , Refine bounding box, Remove duplicate tests , These complex pipelines are very slow and difficult to optimize , Because each individual part must be trained separately .
We redefine target detection as a single regression problem, which is obtained directly from image pixels bounding box Coordinates and class probabilities .
A single convolutional neural network is used to predict multiple bounding box And these boxes Class probability of ,YOLO Train on the whole image and directly optimize the detection representation .
Unlike sliding windows and candidate region based techniques ,YOLO Look at a whole image during training and similarity measurement , Explicitly encode the whole information , Category information and representation .YOLO Learn a more generalized representation of the goal .
YOLO It still lags behind the advanced detection system in accuracy .
3、 ... and 、Unified Detection
Divide the input image into S*S A grid , If the center of the object falls in a grid , Then this grid is responsible for predicting and detecting the object .
Each grid predicts B individual bounding box And the confidence scores of these lattices , These confidence scores reflect the confidence of the grid containing the goal and the accuracy of the goal .
Every bbox Include 5 Predicted values :x,y,w,h,c (x,y,w,h Are normalized )
Each grid predicts C A conditional probability ,
The premise of these probabilities is that the grid has a target , Each grid predicts only one set (C individual ) Class probability , And B The value of has nothing to do with . The image output is 7*7*30 Size ,30 Contains two bbox Of x,y,w,h,c, The rest 20 Dimension outputs a set of class probabilities (C namely 20 individual ).
At testing time , Put the grid
With each bbox Confidence prediction
Multiply , That is, the confidence score of the exact class of each lattice . Those points Numbers indicate bbox The probability that the target belongs to each category and bbox Match the quality of the target .
Degree of confidence :confidence=
Pr(Object) If there is a goal, it is 1, If there is no goal, it is 0
Each grid predicts B individual bbox, as well as bbox Of confidence score and confidence
( Degree of confidence :bbox The probability of containing goals
,bbox The accuracy of
, Confidence is the value of multiplying the two )
( One grid cell Can predict B individual bbox,B individual bbox Separate from this object Of groud truth seek IOU value , Output IOU The biggest one bbox)
Four 、 Loss function

Every bbox It is necessary to calculate the positioning error ( Item 1 and 2 ) And confidence error ( The third one ), Contains the probability of the grid prediction class of the target ( Item 5 ), It does not include the confidence error of object prediction ( Item four )
The first and second terms of the loss function are each bbox Coordinate prediction of , The third item contains goals bbox Of confidence forecast , The fourth item contains no goals bbox Of confidence forecast , The fifth item is category prediction for each grid
After an image is output , Is divided into S*S A grid ( This paper is about 7) Each grid predicts B( This paper is about 2) individual bbox Then the whole image is divided into 7*7=49 A grid Whole image generation 7*7*2=98 individual bbox , Each grid predicts (5*B+C) It's worth , One image predicts S*S*(5*B+C) It's worth
i It means the first one i A grid ,j It means the first one i The th of the grid j individual bbox
i It means the first one i A grid
For those with goals box The punishment ( Great contribution , Then the power is great , by 5)
For those without goals bbox The punishment ( Small contribution , Then the weight is small , by 0.5)
ask :
Every grid cell Medium bbox How to predict ?
How the object selection box is finally generated ?
How to determine a cell Does it include object,Pr(Object)=1?
边栏推荐
- Hidden Markov model HMM
- 【无标题】
- The difference between static method and instance method
- OPENGL学习(二)OPENGL渲染管线
- Timed task framework
- mysql排序.按字段值排序
- Nftscan and port3 have reached strategic cooperation in the field of NFT data
- New stage of investment
- asp. Net coree file upload and download example
- Cesium uses czml to implement dynamic routes
猜你喜欢

Sqoop

卷积神经网络感受野计算指南

Tcl/tk file operation

PostgreSQL weekly news - July 13, 2022
![[question 39] special question for Niuke in-depth learning](/img/18/0e182f2c003ff5dd8ed3751c718d73.png)
[question 39] special question for Niuke in-depth learning

Crazy God redis notes 11

OpenGL learning (IV) glut 3D image rendering

FPGA 20 routines: 9. DDR3 memory particle initialization write and read through RS232 (Part 1)

On July 31, 2022, the dama-cdga/cdgp data governance certification class was opened!

Feature extraction tool transformer Bert
随机推荐
MySQL1
MySQL8.0学习记录20 - Trigger
Installation and use of lsky Pro lancong drawing bed: a drawing bed program for online uploading and managing pictures
Serial vector format (SVF) file format
MySQL8.0学习记录19 - 页区段与表空间
Leetcode652 finding duplicate subtrees
LSTM and Gru of RNN_ Attention mechanism
OpenGL learning (III) glut two-dimensional image rendering
High speed ASIC packaging trends: integration, SKU and 25g+
Principle and application of database
Thread theory knowledge
[resolved] CVC datatype valid. 1.2.1: '' is not a valid value for 'ncname'.
Tcl/tk grouping and replacement rules
OpenGL learning (IV) glut 3D image rendering
[laser principle and application -6]:q switching element and Q drive circuit board
JVM method call
Hangdian multi School Game 1 question 3 backpack (XOR dp+bitset)
MySQL1
Why are there loopholes in the website to be repaired
PostgreSQL weekly news - July 13, 2022