当前位置:网站首页>YOLOV3
YOLOV3
2022-07-26 00:35:00 【I want to send SCI】
YOLOv3 performance The author directly spends performance on Retinanei On the data graph
The left figure shows different thresholds On the right is the threshold 0.5

map It's all the calculated thresholds ap Do average
Why is the first figure lower than the second figure ? Because the first figure is that the threshold reaches 0.95 That has to coincide with the label How amazing that is so Definitely not that strong It's a little low . And the second threshold 0.5 coincidence 0.5 Just go That must be better than 0.95 High
Anyway, the author said High threshold performance is unscientific hahahaha



YOLOV2 yes Darknet-19 Yes 19 layer
53 Namely 52 Convolution and a full connection layer And the residual connection is added inside

Backbone network is the most important !!!!!!!!!!!!!!!! All fields are obtained by post-processing based on the features extracted from the backbone network !! He is the provider of food Target detection head Or the key point is that the detection head is the cook
52 A convolution It's about putting all the convolutional add not residual Oh Is the total 52 Then add the final full connection layer =53
Train this IMagnet After a thousand categories of backbone networks Remove the global average pooling layer behind It acts as a feature extractor
Note that the step size inside is 2 Oh May be 2 Resulting in down sampling



Three scale features are obtained from the input image In the subsequent multi-scale target detection .

These three scales are sampled separately 32 16 8 times
If input 416*416 picture Down sampling :416/32=13*13 26*26 416/8= 52* 52
Because take off the classification head at the back It becomes a full convolution network There is no full connection layer So it can be compatible with images of any scale
256 608 416 As long as it is 32 Multiple Because our next sampling is 32 Multiple of

The second coordinate starts : Good performance Small amount of computation More efficient operation GPU fps More bloated It's a little slow But it is also higher than v1 With 19
Floating point computation More efficient use of GPU

v1 gridcell=7 24 Layer convolution 2 Fully connected layer boundingbox

v2 gridcell=13 Darknet-19 18 Convolution +1 Fully connected layer anchor( A priori box It's the kind that has almost tested tall and thin objects anchorbox They are all tall and thin anchorbox)

Zhihu river da White drawing
Input 416*416*3 Output It's three sizes featuremap13*13*255、 26*26255、52*52*255
255---------------3* 85 3: Every gc Generate 3 individual anchor Every anchor Corresponding to a prediction box Each prediction box corresponds to 5+80 dimension 5:xywhc coco Data sets 80 Conditional probability of categories
13*13*255 The receptive field corresponding to the original image is 32*32、 That means 13*13 Responsible for predicting large objects
because 416/13=32 that 13 It's a grid gc La
26*26255 16*16、 secondary
52*52*255 8*8 、 Small objects


On the sampling 2 times (3*2 =26) In and backbone network 26*26 Scale features for splicing After processing, we get 26*26*255
concat : The operation of stacking two exercise books The thickness of the two books is different Just pile it up along the thickness direction
26*26 This is also sampled 2 times (26*2=52) And backbone network 52*52 Feature stitching of scale Get processed 52*52*255 Characteristics of
in other words : Actually, the last one 52*52*255 Characteristics of Integrate the front 26*26 features It's also a fusion of 13*13 Characteristics of

It gives play to the semantic specialization and abstraction of deep network It also makes full use of the bottom features of the fine-grained pixel level edge corner structure information of the shallow network
Multi scale feature fusion Object detection of different scales

Conditional probability : Suppose the box already exists The probability that he is a cat Dog probability





The backbone neck head
Backnone extract Neck Fusion features fpn head The final prediction
The backbone Full convolution network No full connection Compatible 32 Different scales of multiples



share 9 individual anchor

No longer look at the center of the object gridcell In the Look whose anchor Of iou With objects iou Maximum By the big one anchor( Prediction box ) Prediction object
Not the largest is not a positive sample

Confidence of posterior probability Visualization can see that each box can be seen as a number
YOLOV3 The process
The yellow box marked manually by the dog The central point is the red box
The red one gc There will be three anchor Find and dimension box iou The biggest one anchor Use it to predict objects


YOLOV1 It's the most 98 individual



The larger the input image Got gridcell The number of big prediction boxes is gridsize The number of *3 , The number of prediction frames of the three scales is also large 



Selection of positive and negative samples !!!!!!!!!!!!!!!!!!!!!!!!!!!iou
Greater than threshold iou Maximum Positive sample
Greater than the threshold but not iou Maximum Ignore
Less than the threshold is Negative sample Blue and green !!

Different codes may implement different loss functions
Training

test

conf-score That's it. Posterior probability
Code


Evaluation indicators


Add dense modular Space Pyramid pooling spp



Intensive reading video




256/8=32 32*32 The receptive field corresponding to the original image is 8*8
416/8=52 52*52 The receptive field corresponding to the original image is also 8*8
416/32=13 The receptive field is 32 Large receptive field predicts large objects



residual batchnomalzation (BN) Are commonly used configurations
You can use it to quote literature
Performance indicators !!!!!! Brother Zihao's paper
In the paper IOUthresh=0.5
Pthresh Is the confidence threshold hypothesis 0.2 Well
Both are manually assigned
According to the prediction box and gt Of iou We can know which section he is in Here are the four possibilities 

FP It was originally the background But he also predicted There was no cat But predict a kitten's box
FN Good positioning But the confidence prediction is too small

TP Divided by vertical 
TP Divide by horizontal

map0.5 Find an average 0.5:0.95 Find the average of individual categories We have to find the average of each category


4.
Focal loss Value the ambiguous person Also give him high weight effect nonono

loss
Green green yellow Three numbers are non-zero, that is 1
These three items traverse all prediction boxes

Suppose the predicted value of the cat Pc gt It's a cat.
The first one is -log(pc): -log The predicted value of the cat The closer the prediction is to 1 loss The smaller it is
The second item BCE It's a cat. label Cihat yes 1


Do not understand





Conveyor belt : The article of the big guys

边栏推荐
- MySQL - master-slave replication
- Use localdate class to complete calendar design
- 基于数据要素流通视角的数据溯源研究进展
- 一个List到底能存多大的数据呢?
- 软件测试同行评审到底是什么?
- Preparation of bovine serum albumin modified by low molecular weight protamine lmwp/peg-1900 on the surface of albumin nanoparticles
- 【计算一个字符串和另一个字符串相等的次数】
- The way to understand JS: the principle of object.call and object.create() inheritance
- 8个小妙招-数据库性能优化,yyds~
- Understanding of "dof: a demand oriented framework for imagedenoising"
猜你喜欢

基于数据要素流通视角的数据溯源研究进展

8个小妙招调整数据库性能优化,yyds

Tarjan 求强连通分量 O(n+m) ,缩点

Are you still counting the time complexity?

HOOPS Exchange助力混合计算流体动力学软件搭建3D格式导入读取功能 | 客户案例

The way of understanding JS: write a perfect combination inheritance (Es5)

【Redis】② Redis通用命令;Redis 为什么这么快?;Redis 的数据类型

D3D计算着色器入门

Sorting out the encapsulation classes of control elements in appium

MWEC:一种基于多语义词向量的中文新词发现方法
随机推荐
@RequestParam,@PathVariable两个注解的区别
Preparation of bovine erythrocyte superoxide dismutase sod/ folic acid coupled 2-ME albumin nanoparticles modified by bovine serum albumin
Redis(八) - Redis企业实战之优惠券秒杀
HCIP第十三天
8 tips to adjust database performance optimization, yyds
Four characteristics and isolation level of MySQL transactions
The way of understanding JS: write a perfect combination inheritance (Es5)
什么是 Web3 游戏?
Verilog语法基础HDL Bits训练 06
mysql事务的引入
试除法--3的幂
基于网络分析和文本挖掘的意见领袖影响力研究
JVM Tri Color marking and read-write barrier
HCIP第十二天
Super super super realistic digital people! Keep you on the air 24 hours a day
Research on visualization method of technology topic map based on clustering information
Mwec: a new Chinese word discovery method based on multi semantic word vector
Redis夺命十二问,你能扛到第几问?
Pikachu靶机通关和源码分析
【计算一个字符串和另一个字符串相等的次数】

