当前位置:网站首页>Detailed explanation of retinanet network structure
Detailed explanation of retinanet network structure
2022-07-28 01:12:00 【@BangBang】
1. summary
Retinanet The paper of :Focal Loss for Dense Object Detection
The paper 2017 Years published in CVPR(computer vision and pattern recongnition) , After the paper was put forward one-stage The network surpasses two-stage The Internet
One-stage and two-stage The difference between the Internet
- two-stage : With
Faster RCNNDominanttwo-stageThe Internet , First, we need to pass aRPNNetwork to generate ourProposal, And then throughFast RCNNMake the final prediction of our goal , It is divided into two steps . - one-stage: With
SSD、YOLOSeries based , It is a step to directly predict our final result , But in this article The paper Before putting forward ,one-stage The accuracy of the network is lower thantwo-stageOf , After this paper is put forwardone-stageThe Internet surpassed... For the first timetwo-stageThe Internet .
retinaNet Performance indicators

from RetinaNet The performance parameters given can be seen ,AP (IoU from 0.5-0.95 The average of ) Up to 40.8%. You can see the same period one-stage The Internet YOLOv2 SSD513 Wait for them AP It's all in 21~33 Between .two-stage The Internet was more mainstream at that time Faster-RCNN Reached 36.8, It's obviously related to RetinaNet Of 40.8 It's much lower .
2. RetinaNet The detailed structure of the network

RetinaNet Network structure and FPN Different
RetinaNet The network structure is similar to FPN Network structure , But with FPN The structure has 3 A different place .
- The first difference : FPN Will use
C2What makes usP2, But in ourRetinaNetNot used inC2GenerateP2, The reason given by the author of the paper isP2It will occupy more computing resources (P2 The characteristic graph of is relative to P3~P6 It will be bigger ), In order to save resources, the author did not useP2, But fromC3To start generatingP3, BackBone This part is related toFPNsimilar . Reference resources : object detection FPN(Feature Pyramid Networks) Use - The second difference : stay
P6This place , stayFPNIs down sampled by maximizing pooling , Here is through convolution kernel3x3, The step is2For down sampling , Got usP6 - The third difference : stay FPN From
P2~P6, But in RetinaNet The network is fromP3~P7,P7stayP6On the basis of Relu Activation function and product kernel3x3, The step is2Obtained by down sampling .
The prediction feature layer adopts scale and ratio
Before Blog As mentioned in FPN, Each prediction feature layer uses a scale and 3 individual Ratios namely 3 individual anchor, But in RetinaNet The author uses 3 individual scale and 3 individual ratios, 9 Different species anchor.
Predictor section
Before the blog introduction FPN When the network , In fact, it has something to do with us Faster RCNN It's similar , All are Two-stage The Internet . First of all, it will pass a RPN Generate Proposal, And then through Fast RCNN Generate the final prediction parameters . But we RetinaNet It's a One-stage The network uses predictors directly .
in the light of P3-P7 Predict the characteristic layer , Share a predictor , Of a predictor Details as follows , Divided into two branches class subnet and box subnet Predict the category of each goal and for each anchor To predict the target bounding box regression parameters .
class subnet: use first 4 individual3x3The convolution of layer , Each convolution layer is followed byReluActivation function , One last3x3Convolution layer has no activation function , Similarly, its convolution kernel size is3x3, The step is1,c=KAthere K Indicates the number of detection targets , Does not contain background categories , here A It is at each position on the prediction feature layeranchorNumber , Here is 9box subnet: It's the same thing 4 individual3x3The convolution of layer , Convolution kernel size3x3, The step is1, channel by256, One last3x3Convolution layer has no activation function , Similarly, its convolution kernel size is3x3, The step is1, itschannel=4A,A CorrespondinganchorNumber 9. This with Faster RCNN Is different ,Faster RCNN It is for each of the prediction feature layers anchor A set of bounding box regression parameters will be generated for each category , So it corresponds tochannel=4KA
3. Loss function
Positive and negative samples match
- Before will
Faster RCNNWhen , First of all, we will treat allAnchorMatch , Divide it into positive samples and negative samples , Sample in positive and negative samples , Calculate its loss according to the sampled samples . - stay
RetinaNetWe also need to match positive and negative samples , AndFaster RCNNThe differences are mainly reflected in : First of all, I willAnchorAnd GT CalculationIoU, IfIoUGreater than 0.5 If so, we will mark it as a positive sample , If a anchor With all GT All less than 0.4, We mark it as a negative sample .IoUstay [04,0.5] Between anchor We will give up .
Loss function calculation

RetinaNet A comparison The core knowledge point Namely Focal Loss , See blog :Focal Loss Detailed explanation
- The above loss function is divided into two parts :
Classified loss , Return to loss - Focal Loss Of
Classified lossThe calculation isLoss of all positive and negative samples, And divide by the number of positive samples Return to lossIt's all aboutCalculated by positive samples, Add it up and divide by the number of positive samples- Classification loss used
sigmoid Focal Loss, For regression losses, useL1 Loss
边栏推荐
- 110. SAP UI5 FileUploader 控件深入介绍 - 为什么需要一个隐藏的 iframe
- Retinanet网络结构详解
- oracle分组取最大值
- Postman download and use tutorial
- Network device hard core technology insider firewall and security gateway (IX) virtualization artifact (II)
- DC motor winding parameters
- Swoole协程
- Use of swarm task task
- Canvas analog input box input
- 多线程及多线程程序的编写
猜你喜欢
随机推荐
推荐系统-模型:wide&deep 模型
LSB steganography
DC motor winding parameters
Ink wheel salon | Li Wenjie, Peking University: a graph database system for knowledge atlas application gstore
Starfish Os打造的元宇宙生态,跟MetaBell的合作只是开始
[300 opencv routines] 241. Scale invariant feature transformation (SIFT)
C语言程序设计 | offsetof宏的讲解及其模拟实现
推荐系统-指标:ctr、cvr
Retinanet网络结构详解
Node red interacts with tdengine
Postman下载、使用教程
芯片行业常用英文术语最详细总结(图文快速掌握)
重新定义分析 - EventBridge 实时事件分析平台发布
Oracle error: ora-01722 invalid number
mysql查询条件字段值末尾有空格也能查到数据问题
oracle分组取最大值
scrollview、tableView嵌套解决方案
C language programming | explanation and Simulation of offsetof macro
共创文旅新篇章|新起典与国华文旅签订战略合作协议
Detailed explanation of swoole memory table
![Thesis appreciation [iclr18] a neural language model combining syntax and vocabulary learning](/img/1c/5b9726b16f67dfc2016a0c2035baae.png)








