当前位置:网站首页>Faster - RCNN principle and repetition code
Faster - RCNN principle and repetition code
2022-08-04 07:02:00 【hot-blooded chef】
Principle
Faster RCNN can be divided into four main parts:
- Conv layers.As a CNN network target detection method, Faster RCNN first uses a set of basic conv+relu+pooling layers to extract image feature maps.The feature maps are shared for subsequent RPN layers and fully connected layers.
- Region Proposal Networks.The RPN network is used to generate region proposal boxes.This layer determines whether the anchors belong to the foreground or background through softmax, and then uses the prediction box regression to correct the anchors to obtain accurate proposal boxes.
- Roi Pooling.This layer collects the input feature maps and proposal boxes, extracts proposal feature maps after synthesizing these information, and sends them to the subsequent fully connected layer to determine the target category.
- Classification.Use proposal feature maps to calculate the proposal category, and at the same time bounding box regression again to obtain the final precise position of the detection frame.
Conv layers
The content of the black box in the figure is Conv layers, which are actually classic networks in traditional image classification, such as VGG and ResNet.In target detection, it is the backbone network for feature extraction. It does not directly participate in the prediction of the box, but outputs the feature layer.Therefore, its last layer does not output the number of categories, but outputs a feature layer with variable width and height.
Region Proposal Networks
The specific structure of the RPN network is shown in the green box above.It can be seen that the RPN network is actually divided into 2 lines. The above one obtains foreground and background classification through softmax classification anchors (the number of channels is 18, which is 2x9, and there are 9 a priori boxes in total. 2 is to use multi-class cross entropy. If twoThe meta cross entropy is 1), the following one is used to calculate the bounding box regression offset for anchors to obtain an accurate proposal (the same is 4x9 here, 4 represents the coordinates of the candidate box on the rpn).The proposal here is the proposal box, and the objects in the proposal box belong to the foreground in the picture.So in this part of the RPN network, the model only separates the foreground and background.
The final proposal layer is responsible for synthesizing the positive anchors and the corresponding bounding box regression offset to obtain proposals, while eliminating proposals that are too small and beyond the boundary.In fact, when the entire network reaches the Proposal Layer, it completes the function equivalent to target positioning.
Roi Pooling
Take the input feature layer as an image, use the image captured by the candidate frame generated by rpn, and then resize it to the size of pool_size * pool_size.After this processing, even if the proposal output results of different sizes are of a fixed size, a fixed-length output is realized.
Classification
After obtaining the fixed-size proposal feature maps from the ROI Pooling layer and sending them to the subsequent network, you can see that the following two things have been done:
- Classify proposals through full connection and softmax, which is actually the category of recognition
- Perform bounding box regression on proposals again to obtain a higher-precision rect box
But in the actual code, the output of the ROI Pooling layer needs to be subjected to conventional convolution, AveragePooling, and Flatten operations before the final object specific classification and prediction box regression operations can be performed.So from the overall point of view, the Faster RCNN network is divided into two steps. The first step is to output the proposal frame of the approximate position of the object from backbone->rpn, and the second step is to classify and position the content in the proposal frame in detail.predict.This is the architecture of the Two-Stage target detection network.
How to train Faster RCNN
The training of Faster RCNN is also divided into two steps
- For backbone-rpn training, the first backpropagation will be performed at this time, and the rpn network will output a prediction result as the data input of the subsequent classifier. If it is not trained well, it will result in no data input for the training of the classifier., so sometimes this step is skipped directly.
- Train the classifier according to the output of the rpn network.
It is worth mentioning that during the training process, the training data may have an imbalance of positive and negative samples, such as the number of backgrounds is much larger than the number of foregrounds or the opposite.Then Faster RCNN will limit the number of positive and negative samples to 128. If there are less than 128 positive samples in the image, the mini-batch data will be filled with negative samples.
Code
Reference
边栏推荐
猜你喜欢
随机推荐
网络安全学习的三大不可取之处
Uos统信系统 CA根证书搭建
Multi-threaded sequential output
RuntimeError: You called this URL via POST, but the URL doesn‘t end in a slash and you have APPEND_S
沉浸式体验参加网络安全培训班,学习过程详细到底!
U-Net详解:为什么它适合做医学图像分割?(基于tf-Kersa复现代码)
升级到 MediaPlayer 11 时跳过验证副本的方法
罗斯50分
Pfsense漏洞复现(CVE-2021-41282)
更改软件的默认安装位置
跑跑飞弹室外跑步AR游戏代码方案设计
bitnami/mongodb-sharded在AWS EKS扩展shard失败解决
YOLOv3详解:从零开始搭建YOLOv3网络
matlab的2DCNN、1DCNN、BP、SVM故障诊断与结果可视化
ZYNQ之FPGA LED 灯闪烁实验
硬件描述语言Verilog HDL学习笔记之模块介绍
你要悄悄学网络安全,然后惊艳所有人
VS 2017编译 QT no such slot || 找不到*** 问题
Treating as key frame since WebRTC-SpsPpsIdrIsH264Keyframe is disabled 解决
sql常用函数