当前位置:网站首页>Detailed explanation of retinanet network structure
Detailed explanation of retinanet network structure
2022-07-28 01:12:00 【@BangBang】
1. summary
Retinanet The paper of :Focal Loss for Dense Object Detection
The paper 2017 Years published in CVPR(computer vision and pattern recongnition) , After the paper was put forward one-stage The network surpasses two-stage The Internet
One-stage and two-stage The difference between the Internet
- two-stage : With
Faster RCNNDominanttwo-stageThe Internet , First, we need to pass aRPNNetwork to generate ourProposal, And then throughFast RCNNMake the final prediction of our goal , It is divided into two steps . - one-stage: With
SSD、YOLOSeries based , It is a step to directly predict our final result , But in this article The paper Before putting forward ,one-stage The accuracy of the network is lower thantwo-stageOf , After this paper is put forwardone-stageThe Internet surpassed... For the first timetwo-stageThe Internet .
retinaNet Performance indicators

from RetinaNet The performance parameters given can be seen ,AP (IoU from 0.5-0.95 The average of ) Up to 40.8%. You can see the same period one-stage The Internet YOLOv2 SSD513 Wait for them AP It's all in 21~33 Between .two-stage The Internet was more mainstream at that time Faster-RCNN Reached 36.8, It's obviously related to RetinaNet Of 40.8 It's much lower .
2. RetinaNet The detailed structure of the network

RetinaNet Network structure and FPN Different
RetinaNet The network structure is similar to FPN Network structure , But with FPN The structure has 3 A different place .
- The first difference : FPN Will use
C2What makes usP2, But in ourRetinaNetNot used inC2GenerateP2, The reason given by the author of the paper isP2It will occupy more computing resources (P2 The characteristic graph of is relative to P3~P6 It will be bigger ), In order to save resources, the author did not useP2, But fromC3To start generatingP3, BackBone This part is related toFPNsimilar . Reference resources : object detection FPN(Feature Pyramid Networks) Use - The second difference : stay
P6This place , stayFPNIs down sampled by maximizing pooling , Here is through convolution kernel3x3, The step is2For down sampling , Got usP6 - The third difference : stay FPN From
P2~P6, But in RetinaNet The network is fromP3~P7,P7stayP6On the basis of Relu Activation function and product kernel3x3, The step is2Obtained by down sampling .
The prediction feature layer adopts scale and ratio
Before Blog As mentioned in FPN, Each prediction feature layer uses a scale and 3 individual Ratios namely 3 individual anchor, But in RetinaNet The author uses 3 individual scale and 3 individual ratios, 9 Different species anchor.
Predictor section
Before the blog introduction FPN When the network , In fact, it has something to do with us Faster RCNN It's similar , All are Two-stage The Internet . First of all, it will pass a RPN Generate Proposal, And then through Fast RCNN Generate the final prediction parameters . But we RetinaNet It's a One-stage The network uses predictors directly .
in the light of P3-P7 Predict the characteristic layer , Share a predictor , Of a predictor Details as follows , Divided into two branches class subnet and box subnet Predict the category of each goal and for each anchor To predict the target bounding box regression parameters .
class subnet: use first 4 individual3x3The convolution of layer , Each convolution layer is followed byReluActivation function , One last3x3Convolution layer has no activation function , Similarly, its convolution kernel size is3x3, The step is1,c=KAthere K Indicates the number of detection targets , Does not contain background categories , here A It is at each position on the prediction feature layeranchorNumber , Here is 9box subnet: It's the same thing 4 individual3x3The convolution of layer , Convolution kernel size3x3, The step is1, channel by256, One last3x3Convolution layer has no activation function , Similarly, its convolution kernel size is3x3, The step is1, itschannel=4A,A CorrespondinganchorNumber 9. This with Faster RCNN Is different ,Faster RCNN It is for each of the prediction feature layers anchor A set of bounding box regression parameters will be generated for each category , So it corresponds tochannel=4KA
3. Loss function
Positive and negative samples match
- Before will
Faster RCNNWhen , First of all, we will treat allAnchorMatch , Divide it into positive samples and negative samples , Sample in positive and negative samples , Calculate its loss according to the sampled samples . - stay
RetinaNetWe also need to match positive and negative samples , AndFaster RCNNThe differences are mainly reflected in : First of all, I willAnchorAnd GT CalculationIoU, IfIoUGreater than 0.5 If so, we will mark it as a positive sample , If a anchor With all GT All less than 0.4, We mark it as a negative sample .IoUstay [04,0.5] Between anchor We will give up .
Loss function calculation

RetinaNet A comparison The core knowledge point Namely Focal Loss , See blog :Focal Loss Detailed explanation
- The above loss function is divided into two parts :
Classified loss , Return to loss - Focal Loss Of
Classified lossThe calculation isLoss of all positive and negative samples, And divide by the number of positive samples Return to lossIt's all aboutCalculated by positive samples, Add it up and divide by the number of positive samples- Classification loss used
sigmoid Focal Loss, For regression losses, useL1 Loss
边栏推荐
- Examples of application of JMeter in performance testing
- 立即报名 | 云原生技术交流 Meetup 广州站已开启,8 月 6 号与你相遇!
- Recommend a Hongmeng instant messaging software "fruit chat", which is a bit awesome!!
- 7. Typescript part Foundation
- Vandermond convolution learning notes
- Swoole定时器
- Syntaxerror resolved: positive argument follows keyword argument
- 【OpenCV 例程 300篇】241. 尺度不变特征变换(SIFT)
- Database daily question --- day 22: last login
- Uniapp display rich text effect demo (organize)
猜你喜欢
随机推荐
Valued at $36billion! SpaceX, which is about to launch its first manned launch, raised $346million
Add a picture in front of the cell
"C language" deep entry rounding & four functions
推荐系统-离线召回:u2tag2i、icf
Starfish Os打造的元宇宙生态,跟MetaBell的合作只是开始
Network device hard core technology insider firewall and security gateway (IX) virtualization artifact (II)
【C语言入门】ZZULIOJ 1026-1030
Interface test practical project 02: read interface test documents and practice
SRv6初登场
Swoole内存-table详解
Thesis appreciation [iclr18] a neural language model combining syntax and vocabulary learning
文件系统的层次结构
mysql查询条件字段值末尾有空格也能查到数据问题
Vandermond convolution learning notes
Un7.13: how to add, delete, modify and query in vs Code?
110. In depth introduction to sap ui5 fileuploader control - why do you need a hidden iframe
DC motor winding parameters
The most detailed summary of common English terms in the chip industry (quick grasp of graphics and text)
Recommended system model (III): fine scheduling model [LR, gbdt, wide & deep, DCN, DIN, Dien, MMOE, ple]
7. Typescript part Foundation





![Leetcode:1997. the first day after visiting all rooms [jump DP]](/img/6e/52d5871a11d1b27e673112a8245b28.png)



