当前位置:网站首页>Target detection - personal understanding of RCNN series

Target detection - personal understanding of RCNN series

2022-06-11 04:52:00 Panbohhhhh

Write it at the front :

disclaimer

This article is purely a personal discussion of technology , Welcome to discuss learning . Specific references and related codes will be given later in the article , Don't do any business .

Because of limited personal ability , There are many shortcomings , Welcome to correct .

 

 

object detection ------object detection

 

1. brief introduction

This article is not a purely technical discussion , Go deep into the article . I would like to use the popular vernacular language to record my personal learning feelings and puzzles , It mainly describes English literature or professional terms , Enhance personal understanding , Leak filling , Suitable for interview and technical discussion .

 

2. Small knowledge of target detection

2.1 What is target detection

Simply speaking , Given a picture , After analysis , Give two answers :what & where

 

chart 1. Example diagram of target detection definition

2.2 Common data set

Common datasets are

VOC2007、VOC2012:20 class , The scene is relatively simple ,11,530 A picture ,27,450 object , The image content is relatively simple , The picture is less occluded , Suitable for beginners .

COCO:91 class , A complex scenario ,328,000 Images ,2,500,000 object , The image is complicated ,

chart 2:COCO Sample dataset diagram

From a personal point of view , Now whether it's writing a paper , Actually measured in specific projects , It's all from VOC Turned to COCO, This data set at the current time , Authoritative ( date :2021 year 5 month 12 Japan )

2.3. Target detection process

relatively speaking , At present, it has been fixed

The input image ---> Target potential location filter ---> Candidate box extraction ---> Candidate area classification ---> result

A more vivid example is shown in the figure below :

chart 3: Target identification process

2.4. Target detection classification

At present, target detection is mainly divided into two categories ,two-stage and one-stage, Based on personal project experience ,one-stage It has become the mainstream , Because of the fast ...

chart 4: Target detection classification

For now ,two-stage High precision , Slow speed

one-stage Fast The accuracy is slightly lower , The trend of The Times

 

3. classic RCNN series

3.1 R-CNN

 

 

chart 5: R-CNN

R-CNN Using traditional target recognition practices .

In the first step : Candidate region extraction here ,R-CNN Pick out every candidate area , Then feature extraction is done for each candidate region , Reuse SVM Classifiers do classification .

shortcoming : Almost every picture will produce 2000 Multiple candidate regions , First, the workload is relatively large , But the overlap of these more than 2000 areas is relatively large .

There is a saying and a link : As the first generation R-CNN, Now it seems that there are some shortcomings , it .

 

3.2. Fast R-CNN

chart 6: Fast R-CNN

The point is fast R-CNN Improvement

From the picture 6 It can also be seen that , differ R-CNN,Fast R-CNN First, check the whole picture , Use CNN Extract feature areas , Then in the relatively small feature map Extract features from .

But this will produce a problem , The size of each candidate region is different , The extracted feature dimensions are different .

Fast R-CNN Made a particularly talented change :

The classifier is replaced by a full connection layer , Like neural networks . Because the number of input nodes in the full connection layer is fixed .

Concrete realization :

Fast R-CNN The of the last convolution layer SSP Layer Change it to Roi Pooling Layer; In addition, a multi task loss function is proposed (Multi-task Loss), Add border regression directly to CNN Training on the Internet , Both candidate region classification loss and location regression loss are included .

Rol Pooling Layer: It's actually SPP Layer A simplified version of the ,SPP Layer Pyramid maps of different sizes are used for each candidate region , namely SPP Layer Multi scale pool layer is used for pool operation ; and Roi Pooling Layer We only need to sample the feature map of different scales to a fixed scale ( for example 7*7). For example, for VGG16 The Internet conv5_3 Yes 512 A feature map , Although the size of the input image is arbitrary , But by Roi Pooling Layer after , Will produce a 7*7*512 The feature vector of the dimension serves as the input of the full connectivity layer , namely Roi Pooling Layer Only a single scale is used for pooling . As shown in the figure below :

 

chart 7. RoI pooling operation

 

SPP Layer and the RoI pooling Layer makes the network no longer have restrictions on the size of the input image , meanwhile RoI pooling It's solved SPP Unable to update the weight .RoI pooling Layer has two main functions : first : Put... In the image RoI The region is located to the corresponding position in the convolution feature ; the second : The corresponding convolution feature area is fixed to the feature of a specific length through pooling operation , The feature is then fed into the full connection layer .

3.3 Faster RCNN

chart 8 Faster R-CNN

There's nothing to say ,

RCNN As a relatively early Technology , What we focus on is the framework and structure it has established .

In the future, I will continue to post , Main research anchor-free This direction .( Because it's hot this year )

Reference

1. Cherry Blossom book & Watermelon book

2.https://blog.csdn.net/fengbingchun/article/details/87091740

原网站

版权声明
本文为[Panbohhhhh]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/03/202203020545290706.html