当前位置：网站首页>Target detection notes fast r-cnn

Target detection notes fast r-cnn

2022-07-28 22:53:00 【leu_ mon】

Faster R-CNN

1.R-CNN

2014 Year by year Ross Girshick In the paper Rich feature hierarchies for accurate object detection and semantic segmentation It is proposed that .

1）RCNN Algorithm flow ：

1. A picture is generated 1K~2K Candidate areas （ Use Selective Search Method ）

Capture

2. For each candidate area , Use deep networks to extract features

Capture

3. Features are fed into each type of SVM classifier , Judge whether it belongs to this category

Capture

4. Use regression to fine tune candidate box position

Capture

2）R-CNN The problem is ：

1. The test speed is slow

2. Training is slow

3. Training needs a lot of space

2.Fast R-CNN

Capture

When selecting eigenvalues ,R-CNN The eigenvalues of candidate regions need to be calculated repeatedly ,fast R-CNN, Just calculate the eigenvalue of the whole graph , Then the candidate regions are mapped to the feature map , Thus greatly reducing the amount of calculation .

Not all candidate boxes are selected during training , Only a part will be taken , The sampling of training data should have positive samples and negative samples , In the author's paper , One time use 64 Candidate box , There are positive samples and negative samples , The author believes that as long as the candidate box and the real sample box IOU Greater than 0.5 Then it is recognized as a positive sample , Less than 0.5 Of is certified as a negative sample .

Select samples and pass ROI Pooling Layer shrink to 7*7 Size

1） classifier

Output N+1 The probability of categories ,N Types of detection targets , A background , adopt softmax The sum of all the probabilities after the function is 1. Capture

2） Bounding box regressors

Output the corresponding N+1 Candidate bounding box regression parameters for categories ( $d_x,d_y,d_w,d_h$ ), common (N+1)*4 Nodes .
Capture
Capture

3） Loss function

The loss function is also divided into classification loss and boundary box regression loss .

Capture

Classified loss ： $L_{cls}(p,u)=-log p_u$ That is to say softmax Cross entropy loss 
Cross entropy loss (Cross Entropy Loss)

Many classification problem (softmax Output , all The sum of the output probabilities is 1)

$-\sum_io_i^*log(o_i)$

Two classification problem (sigmoid Output , Every Output nodes do not affect each other )

$-\frac{1}{N}\sum_{i=1}^N[o_i^*logo_i+(1-o_i^*)log(1-o_i)]$

ps: among $o_i^*$ Is the true label value , $o_i$ For the predicted value , Default $l o g$ With $e$ The bottom is equal to $l n$

Bounding box regression loss ： $[u > = 1]$ yes Iverson brackets , When u>=1 When is 1, Others are 0.
Iverson brackets represent , When the detection target is the background, the bounding box loss does not exist .
Bounding box regression parameters $v_i$ Use $G_i$ The inverse solution of the formula is sufficient .

4）Fast R-CNN frame

Capture

3.Faster R-CNN（RPN+Fast R-CNN）

contrast Fast R-CNN,Faster R-CNN It's just Use one RPN The Internet has replaced SS Candidate box generation algorithm .

1）RPN structure

Capture

Acquisition of feature map , Use ZF Convolution network obtained 256 Characteristic diagram of the channel , Use VGG16 Convolution network obtained 512 Characteristic diagram of the channel
according to The original image corresponding to the characteristic image Location ( Here we use the feature graph and the original graph Equal proportion corresponds to that will do , Error allowed , That is, if the proportion is not integer ), frame K Specified size ( Here is Author use Size ) Of anchor boxes.
Through the above steps, we can get nearly 2w individual anchor boxes, Remove crossing edges Is left 6k, There are many boxes with overlapping parts , Use Non maximum suppression ,IOU Set to 0.7, So each picture is left with 2k individual Candidate box .
adopt cls layer Generate 2K A category score ( Background and objects Probability , here Use softmax Many classification , if Use sigmoid The second category is K individual Category score ), adopt reg layer Generate 4K A bounding box regression parameter .
About positive and negative samples ： Sample and calibration box IOU Greater than 0.7 Or the biggest One is called Positive sample , Less than 0.3 Of is called Negative sample .

2）RPN Loss function

Capture

Cross entropy used for classification loss Calculation ( Select according to the classification using the algorithm Many classification still Two classification ), Border regression loss Calculation Same as Fast R-CNN Boundary box loss calculation Agreement . $\lambda$ The value in the article is 10, Therefore, the parameters can be approximately merged . Back Fast R-CNN The loss function of the network is the same as before . In the article Use dual networks to train alone , It is usually used directly RPN and Fast R-CNN Lose the method of joint training .

3）Faster R-CNN frame

Capture

4） The problem is

The detection effect of small targets is poor High level abstraction , Resulting in the loss of features .
The model is big , The detection speed is slow , The main reason is because of two predictions .

原网站

版权声明
本文为[leu_ mon]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/196/202207130600510221.html