当前位置:网站首页>[reading notes] for paper: summary of three papers in r-cnn series
[reading notes] for paper: summary of three papers in r-cnn series
2022-07-28 17:59:00 【jsBeSelf】
List of articles
1 R-CNN
1.1 Introduce
- R-CNN, namely region proposals( candidate region ) + CNN, Yes, it will CNN A pioneering work of introducing target detection .
1.2 General steps

- As shown in the figure ( From the original paper https://arxiv.org/abs/1311.2524), Extract first region proposal, The predefined number is about 2000, Then match the picture proposal Cut the area of , And processed into uniform size ( Because the full connection layer is used later (FC), When fixing the size of the full connection layer , This fixes the vector dimension of its input , So we need uniform size ), And enter the CNN To extract features , Form a large number of feature map, Finally using SVM Classify and do Bounding-box regression.
1.3 characteristic
- There are two key points mentioned in the paper : First, feature extraction , Use high capacity CNN, Realize bottom-up generation region proposal; Second, training strategies , First, pre train the network parameters on the big data set , Then fine tune the field ( Similar to the idea of transfer learning ).
- advantage : Compared with the same period, it is also used CNN To extract features OverFeat Model ,R-CNN Greatly improved VOC 2012 Upper mAP, It's really going to CNN Applied to the field of target detection .
- But its disadvantage is : Slow speed ; extract proposal When , The computer does a lot of double counting ( Each of the pictures proposal All have to enter the subsequent network separately for calculation ); It is still used in classification SVM, The classification effect is poor , If there are more categories , You need more SVM, The training process is cumbersome , And the latter classification and regression network is separated from the former feature extraction network .
1.4 Knowledge supplement
1) In the past, traditional methods were used to extract features ( Such as HOG,SIFT etc. ), Here we use CNN, because CNN It has the characteristics of weight sharing , It is equivalent to spreading the calculation to all classes , And through CNN After feature extraction ,feature map The size of is much smaller than the original image , It improves the calculation efficiency , Therefore, the memory and time consumption of calculation are saved .
2) When classifying , Used hard negative mining( It's hard to find examples ), Don't go into details , The general meaning is to record the samples of network error classification in each round of training , And then keep training , Until the performance of the network classifier cannot be improved .
3) In the frame regression, we are training four parameters , Two parameters control the position of the center point of the target box , The other two parameters control the size of the target frame . In the next section, we will further introduce .
2 Fast R-CNN
2.1 Introduce
- Fast R-CNN, namely Fast Region-based Convolutional Network method, In order to in R-CNN And further improve the detection accuracy , At the same time, speed up the detection .
- R-CNN The problem of slow speed , stay SPP-net It's solved in . because CNN To improve accuracy , You can deepen the network , Extract more features , however R-CNN The structure of makes too many repeated calculations when extracting features , and SPP-net Found this , And made improvements , But it also has some disadvantages , As a result, network parameters cannot be updated during reverse transmission , Finally, the detection accuracy is not high , and Fast R-CNN It solves these problems , Taking into account the advantages of these two models .
2.2 General steps

- The network structure is shown in the figure ( From the original paper https://arxiv.org/abs/1504.08083), The steps are : Input the whole picture into CNN, Feature extraction , And then feature map Generate on ROI, Then passed ROI pooing Layers change the size of the feature map , Last pass FC After the layer , It is divided into two brother branches , the Bounding-box regression And classification are added to the end of the feature extraction network , Turn the whole network into a single-stage multi-task.
2.3 characteristic
- Several key parts are mentioned in the paper :
- 1) How to improve accuracy at the same time , Speed up calculation ? Before SPP-net The reason why the network parameters cannot be updated through reverse transmission is that SGD( Stochastic gradient descent ) In the process of updating parameters , Due to improper sampling strategy , Resulting in low computational efficiency , Consume more computing resources , Therefore, the sampling strategy is improved : Simply speaking , stay SGD in , For one mini-batch, The original strategy was to choose a single one in a batch of pictures ROI, To calculate , Instead, select a batch from a single picture ROI. Because of the same picture ROI Computing memory can be shared during forward and back propagation , So it improves the efficiency . and , In terms of network structure ,Fast R-CNN Streamline calculation method is used , Turn the model into a multi task model , In the training CNN At the same time , Trained classification and border regression together . Because the network should realize multitasking , therefore loss The calculation method of will also change .( stay 2.4 This section discusses )
- 2) Sort from SVM Change to use softmax, because SVM Strictly carry out secondary classification , There is no way to take into account the intersection of categories , and softmax Then the possibility that the target belongs to all categories is considered .
- 3) stay ROI Sampling strategy , The first choice used in the paper is 25% Of and ground truth Of IOU stay 0.5 The above ROI, Then the rest is the choice IOU stay 0.1-0.5 The largest in the interval ROI.
- 4) In the process of network training , In order to achieve scale invariance , There was originally a method of violence , That is, all training and detection images are simply processed into a unified size . The image pyramid method is used here , Preset some sizes , Then when sampling pictures during training , Choose a size randomly for training , This is actually a method of data enhancement .
- Last , The paper is verified by experiments , Whether these theories really work , For example, by controlling variables to compare mAP The change of . Verified :1) Multi stage training and multi task training ;2) Implementation method of scale invariance : Violence law and image pyramid ;3) Do you need more data : If there is more data , The accuracy of the model can also be improved , This is consistent with the nature of a good model ;4) contrast SVM and softmax:softmax better ;5) whether proposal The more settings, the better : Not at all , Too many proposal It may also lead to a decrease in accuracy .
- shortcoming :Selective Search( Selective search ) The speed of selecting candidate boxes in the algorithm has not improved , This has become the bottleneck of the network .
2.4 Knowledge supplement
1) Bounding box regression
- Is to find a mapping , Make the predicted box close to the real box , Including translation ( Two parameters )+ The zoom ( Two parameters ) Two parts . But there are two problems : Why is it designed tx,ty Why divide by width and height ? Why? tw,th There will be log form ?
- answer : The first problem is for scale invariance . If you just look △x or △y, Different targets in the picture have different sizes , It's not easy to train . If you divide by the width and height of the target box at this time , We can achieve equality in scale , Easy to train ; The second problem is that the scaling factor must be greater than 0, So I used it exp function , So when you study in turn, it's log 了 .
2) Multitasking loss
- multitasking loss(multi-task loss) It's made up of two parts , The first part is used to represent category prediction , The negative logarithm of the prediction probability of each category . The second part is the border regression loss, by smoothL1. And the two-part loss uses a coefficient λ To adjust the weight , Avoid tendency .
- problem : Why take negative logarithm ?smooth-L1 Compared with the previous improvements ?
answer :1) In practice , Maximizing the log likelihood function is more convenient , Because the logarithmic function is a monotonically increasing function of its parameters (monotonically increasing), In terms of logarithm , The successive multiplication of probability is likely to lead to underflow , And after taking logarithms, it can be converted into continuous addition . And the probability is greater than 0, Less than 1, The logarithm is negative , So add a minus sign to make it positive .2)L1 loss For outliers ( Extreme values ) Sensitivity ratio R-CNN Used in L2 loss weak , Better robustness .
3 Faster R-CNN
3.1 Introduce
- Faster R-CNN = RPN + Fast R-CNN, Because these three models are anchor-base, The bottleneck of the previous model is anchor On the preset , therefore Faster R-CNN Aiming at this bottleneck, this paper puts forward RPN(Region Proposal Network).
3.2 General steps

The network structure is shown in the figure ( From the original paper https://arxiv.org/abs/1506.01497), The general procedure is : First input the image into the backbone feature extraction network ( Such as ResNet), obtain feature map, Then input the feature map RPN, First, use another layer 3*3 Convolution extraction of features , Then enter two branches , A branch is used to learn anchor Four parameters of , Another branch is used to learn anchor Inside is the probability of the difference between foreground and background , Merge two branches to get proposals, Then with the front feature map Enter... Together ROI pooing, Finally, classification and regression . In fact, the point is that RPN.
3.3 characteristic
- RPN The number of channels of the two branches in is 2k and 4k, Among them k representative k individual anchor, In the paper k=9, Yes 3 Species length width ratio and 3 Species scale , To combine .
- RPN Middle school learning anchor The probability of belonging to foreground and background respectively is classified into two categories , through softmax To learn .
- RPN Use in 3*3 Convolution to achieve sliding window .
4 summary
- The above describes the target detection two-stage Model of ,R-CNN Three papers in the series , seeing the name of a thing one thinks of its function , Mainly with CNN of , The goal of improvement is to consider accuracy and speed , Consider how you can have both fish and bear's paws . The direction of improvement lies in the network structure and training methods .
- Read next time one-stage The representative of the ,YOLO A series of papers .
边栏推荐
- @Detailed explanation of requestmapping
- ROS custom message and use
- [unity FPS] tutorial | using unity to make a first person character controller
- centos8使用docker安装wordpress+mysql配置文件中WORDPRESS_DB_HOST的理解
- abstract、static、final
- TensorFlow2.0(十二)--实现简单RNN与LSTM网络
- Leetcode systematic question brushing (V) -- dynamic programming
- Point cloud processing -- binary tree
- com.mysql.jdbc. Configuration files of driver and com.mysql.cj.jdbc.driver
- 1.4-dos
猜你喜欢

TensorFlow2.0(十二)--实现简单RNN与LSTM网络
![[p5.js learning notes] basic knowledge of code drawing](/img/22/30218278b4663e13bf73b25b3bd66f.png)
[p5.js learning notes] basic knowledge of code drawing

3D point cloud processing series - ---- PCA

多线程的使用

Openmv (III) -- get camera pictures in real time

uniapp本地打包权限配置问题

Prize essay solicitation | the 2022 cloud native programming challenge draft activity is open!

【干货】如何建立支持和产品之间的密切关系?

IO的操作

Encapsulation, inheritance, polymorphism
随机推荐
1.1-注释
wordpress提示建立数据库连接时出错
OpenMV(六)--STM32实现物体识别与手写数字识别
mysql5.7压缩包安装教程
Jetson Nano 上安装 tensorflow2.1 和 pytorch1.4
关于localtion 下的root和alias的区别
Leetcode systematic question brushing (II) -- greed, backtracking, recursion
Idea error running 'application' command line is too long solution
MySQL详解
视频号如何将公域流量将用户导入私域
视频号一场书法直播近20万人观看
Tensorflow2.0 (XI) -- understanding LSTM network
[unity] timeline learning notes (VII): Custom clip
有奖征文 | 2022 云原生编程挑战赛征稿活动开启!
[C language note sharing] - dynamic memory management malloc, free, calloc, realloc, flexible array
3.2-随机数
word文档删除最后一页
数字滤波器(六)--设计FIR滤波器
[p5.js learning notes] basic knowledge of code drawing
有一种密码学专用语言叫做ASN.1