当前位置:网站首页>Faster - RCNN principle and repetition code
Faster - RCNN principle and repetition code
2022-08-04 07:02:00 【hot-blooded chef】
Principle
Faster RCNN can be divided into four main parts:
- Conv layers.As a CNN network target detection method, Faster RCNN first uses a set of basic conv+relu+pooling layers to extract image feature maps.The feature maps are shared for subsequent RPN layers and fully connected layers.
- Region Proposal Networks.The RPN network is used to generate region proposal boxes.This layer determines whether the anchors belong to the foreground or background through softmax, and then uses the prediction box regression to correct the anchors to obtain accurate proposal boxes.
- Roi Pooling.This layer collects the input feature maps and proposal boxes, extracts proposal feature maps after synthesizing these information, and sends them to the subsequent fully connected layer to determine the target category.
- Classification.Use proposal feature maps to calculate the proposal category, and at the same time bounding box regression again to obtain the final precise position of the detection frame.
Conv layers
The content of the black box in the figure is Conv layers, which are actually classic networks in traditional image classification, such as VGG and ResNet.In target detection, it is the backbone network for feature extraction. It does not directly participate in the prediction of the box, but outputs the feature layer.Therefore, its last layer does not output the number of categories, but outputs a feature layer with variable width and height.
Region Proposal Networks
The specific structure of the RPN network is shown in the green box above.It can be seen that the RPN network is actually divided into 2 lines. The above one obtains foreground and background classification through softmax classification anchors (the number of channels is 18, which is 2x9, and there are 9 a priori boxes in total. 2 is to use multi-class cross entropy. If twoThe meta cross entropy is 1), the following one is used to calculate the bounding box regression offset for anchors to obtain an accurate proposal (the same is 4x9 here, 4 represents the coordinates of the candidate box on the rpn).The proposal here is the proposal box, and the objects in the proposal box belong to the foreground in the picture.So in this part of the RPN network, the model only separates the foreground and background.
The final proposal layer is responsible for synthesizing the positive anchors and the corresponding bounding box regression offset to obtain proposals, while eliminating proposals that are too small and beyond the boundary.In fact, when the entire network reaches the Proposal Layer, it completes the function equivalent to target positioning.
Roi Pooling
Take the input feature layer as an image, use the image captured by the candidate frame generated by rpn, and then resize it to the size of pool_size * pool_size.After this processing, even if the proposal output results of different sizes are of a fixed size, a fixed-length output is realized.
Classification
After obtaining the fixed-size proposal feature maps from the ROI Pooling layer and sending them to the subsequent network, you can see that the following two things have been done:
- Classify proposals through full connection and softmax, which is actually the category of recognition
- Perform bounding box regression on proposals again to obtain a higher-precision rect box
But in the actual code, the output of the ROI Pooling layer needs to be subjected to conventional convolution, AveragePooling, and Flatten operations before the final object specific classification and prediction box regression operations can be performed.So from the overall point of view, the Faster RCNN network is divided into two steps. The first step is to output the proposal frame of the approximate position of the object from backbone->rpn, and the second step is to classify and position the content in the proposal frame in detail.predict.This is the architecture of the Two-Stage target detection network.
How to train Faster RCNN
The training of Faster RCNN is also divided into two steps
- For backbone-rpn training, the first backpropagation will be performed at this time, and the rpn network will output a prediction result as the data input of the subsequent classifier. If it is not trained well, it will result in no data input for the training of the classifier., so sometimes this step is skipped directly.
- Train the classifier according to the output of the rpn network.
It is worth mentioning that during the training process, the training data may have an imbalance of positive and negative samples, such as the number of backgrounds is much larger than the number of foregrounds or the opposite.Then Faster RCNN will limit the number of positive and negative samples to 128. If there are less than 128 positive samples in the image, the mini-batch data will be filled with negative samples.
Code
Reference
边栏推荐
猜你喜欢
随机推荐
C# 剪裁图片内容区域
网络安全工程师们改不掉的“老毛病”
无一技之长学什么可以做到月入上万?
MVC custom configuration
【HIT-SC-MEMO4】哈工大2022软件构造 复习笔记4
目标检测中的先验框(Anchor)
硬件描述语言Verilog HDL学习笔记之模块介绍
JUC锁框架——CountDownLatch、CyclicBarrier和Semaphore
Memory Management
FCN——语义分割的开山鼻祖(基于tf-Kersa复现代码)
JUC并发容器——ConcurrentLinkedQueue
基于Webrtc和Janus的多人视频会议系统开发5 - 发布媒体流到Janus服务器
益智小游戏- 算算总共多少正方形
MySQL之SQL结构化查询语言
如何在网页标题栏中加入图片!
复杂格式的json转递
狗都能看懂的Self-Attention讲解
clssloader与双亲委派
Fabric v1.1 environment construction
罗斯50分