当前位置:网站首页>Target detection series -- detailed explanation of RCNN principle

Target detection series -- detailed explanation of RCNN principle

2022-06-22 07:35:00 Bald little Su

 

Author's brief introduction : Bald Sue , Committed to describing problems in the most popular language

Looking back :ubuntu Use guide ​   Alibaba cloud object storage oss+picgo+typora Implementation steps and solution of unable to upload pictures

Near term goals : Have 10000 fans
Support Xiao Su : give the thumbs-up 、 Collection 、 Leaving a message.

RCNN principle

Write it at the front

   RCNN It is a pioneering work in the field of target detection , The author is Ross Girshick , We call it RGB A great god Can be in google Look at the articles written by Daniel in academic circles , Look at the number of citations , Can only exclaim !!!
 Insert picture description here

   Next, we will introduce... In detail RCNN Principle , Let's take a look at this classic picture in the paper . This picture shows RCNN Implementation process , There are four main steps , Each step is explained below .

image-20220616214629372

 
 

Candidate area generation

   Candidate regions are generated in RCNN It is used in selective search 【 abbreviation SS Algorithm 】, The principle of this algorithm is roughly through color 、 size 、 Some features such as shape cluster the image , The result of the algorithm is to generate a series of candidate boxes in a picture ,RCNN Make every image generate 2000 Candidate box . These candidate boxes have a lot of overlap , Therefore, we need to remove these overlapping candidate boxes later , Get a relatively accurate candidate box .【 notes : Here is wrong SS Explain the algorithm in detail , Those who are interested can consult and understand by themselves 】 The following figure shows SS The approximate result of the algorithm , It can be seen that multiple candidate boxes will be generated for a target .【 notes :RCNN in SS The number of candidate frames generated by each image is 2000】

image-20220616214600021

 
 

Feature extraction by neural network

   In the last step, we started from SS The algorithm gets... From a picture 2000 Candidate box , Next, we need to extract the features of these candidate boxes , That is, separate 2000 Candidate box areas are fed ALexNet Network training , The extracted features .【 notes : of ALexNet I introduced the network structure of , Unclear click * Learn more 】 For the convenience of reading , I put ALexNet The network structure of is also posted for your reference , As shown in the figure below :

   It should be noted that , stay RCNN in , We don't need the last softmax layer , You only need to go through the last two full connection layers , Using the extracted features can . In addition, due to the existence of full connection layer , You need to limit the size of the output picture , That is, the resolution of the picture is 227*227. The method used in this paper is regardless of the size or aspect ratio of the candidate region , First expand around it 16 Adjacent pixels , Then force all pixels to zoom to 227*227 Size .【 notes : It can be seen that this scheme will distort the original image , For example, people become shorter and fatter 】 The relevant scaling scheme is shown in the following figure :

image-20220616222633371

picture source B Brother Tongji Zihao

 
 

SVM Classifier classification

   We have passed the previous step ALexNet The network extracts features , Each candidate box area will generate 4096 The eigenvectors of the dimensions , As shown in the figure below :

image-20220617094808987

picture source B Stand thunderbolt Wz
 

   The above figure shows the feature extracted from a candidate box , We use SS The algorithm generates... From an image 2000 Candidate box , Enter all candidate boxes into the network , Will get 2000*4096 The characteristic matrix of dimension . take 2000*4096 The characteristic matrix of dimension and 20 individual SVM The weight matrix 4096*20 Multiply , You'll get 2000*20 The probability matrix of dimension , Each row represents the probability that a candidate box belongs to each target category .【 Be careful : If you use VOC Data sets , So the category should have 21 class , Include a background class 】

image-20220617100136537

picture source B Stand thunderbolt Wz

 

   To make it easier for everyone to understand , For the above structure ① Explain in more detail , As shown in the figure below :

image-20220617100242991

   As can be seen from the above figure ,2000*20 Each column of a dimensional matrix represents 2000 The prediction probability of each candidate box for a certain class , For example, the first column indicates 2000 The prediction probability of each candidate box to the dog . We perform non maximum suppression for each column or class (NMS) Used to eliminate overlapping candidate boxes , Get the suggestion box with the highest score in the column . Specifically NMS The process is as follows :

image-20220617101827955

picture source B Stand thunderbolt Wz

 

   This part may be a bit confusing at first , Why delete IOU Big goals ? I have had this question before , In fact, we are not very clear about this process . First, we will find the goal with the highest score in a certain column , Then it will calculate other goals and the goal with the highest score IOU【 Note that it is not calculation and Ground Truth Of IOU】, This IOU What does big mean ? The larger the value, the more the two candidate boxes overlap , It means that the two candidate boxes are likely to represent the same object , Then it is easy to understand to delete the candidate box with low score . The following figure shows the relevant process :

image-20220617102646414

picture source B Stand thunderbolt Wz

 
 

The regressor corrects the position of the candidate box

   In the previous step, we eliminated many candidate boxes , Next, we need to further filter the remaining candidate boxes , That is to say, use respectively 20 A regressor for the above 20 The remaining candidate boxes in each category are regressed , Finally, get the highest score of each category after correction bounding box.

   So how do we get the final prediction box from the candidate box ? We will still be ALexNet The output eigenvector is used to get the prediction result of the regressor , The result is ( d x ( P ) , d y ( P ) , d w ( P ) , d h ( P ) ) (d_x(P),d_y(P),d_w(P),d_h(P)) (dx(P),dy(P),dw(P),dh(P)) , It represents the center point coordinate offset and the scaling factor of the width and candidate box Height offset . The result of its prediction G i ∧ {\mathop {\rm{G_i}}\limits^ \wedge} Gi The expression for is as follows :

image-20220617104327448

picture source B Brother Tongji Zihao
 

   We solve the inverse of the above equation ( d x ( P ) , d y ( P ) , d w ( P ) , d h ( P ) ) (d_x(P),d_y(P),d_w(P),d_h(P)) (dx(P),dy(P),dw(P),dh(P)) The expression of , Current use ( t x , t y , t w , t h ) (t_x,t_y,t_w,t_h) (tx,ty,tw,th) Express , Because the dimension box parameters and candidate box parameters are given , therefore ( t x , t y , t w , t h ) (t_x,t_y,t_w,t_h) (tx,ty,tw,th) It can also be calculated directly , For real value .

image-20220617105213156

picture source B Brother Tongji Zihao
 

   Next use ( d x ( P ) , d y ( P ) , d w ( P ) , d h ( P ) ) (d_x(P),d_y(P),d_w(P),d_h(P)) (dx(P),dy(P),dw(P),dh(P)) Value de fitting ( t x , t y , t w , t h ) (t_x,t_y,t_w,t_h) (tx,ty,tw,th) value , Minimize the loss function , The loss function is as follows :


 

Summary

   RCNN That's all for the principle of , I hope it can help you . It will be updated continuously in the future fast_RCNN and Faster_RCNN And related code explanation , Come on !!!

 

Reference link

RCNN A collection of theories

RCNN Intensive reading

 
 
If the article is helpful to you , It would be

Whew, whew, whew ~~duang~~ A great bai

原网站

版权声明
本文为[Bald little Su]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/173/202206220728175205.html