当前位置:网站首页>Target detection - personal understanding of RCNN series
Target detection - personal understanding of RCNN series
2022-06-11 04:52:00 【Panbohhhhh】
Write it at the front :
disclaimer
This article is purely a personal discussion of technology , Welcome to discuss learning . Specific references and related codes will be given later in the article , Don't do any business .
Because of limited personal ability , There are many shortcomings , Welcome to correct .
object detection ------object detection
1. brief introduction
This article is not a purely technical discussion , Go deep into the article . I would like to use the popular vernacular language to record my personal learning feelings and puzzles , It mainly describes English literature or professional terms , Enhance personal understanding , Leak filling , Suitable for interview and technical discussion .
2. Small knowledge of target detection
2.1 What is target detection
Simply speaking , Given a picture , After analysis , Give two answers :what & where

chart 1. Example diagram of target detection definition
2.2 Common data set
Common datasets are
VOC2007、VOC2012:20 class , The scene is relatively simple ,11,530 A picture ,27,450 object , The image content is relatively simple , The picture is less occluded , Suitable for beginners .
COCO:91 class , A complex scenario ,328,000 Images ,2,500,000 object , The image is complicated ,

chart 2:COCO Sample dataset diagram
From a personal point of view , Now whether it's writing a paper , Actually measured in specific projects , It's all from VOC Turned to COCO, This data set at the current time , Authoritative ( date :2021 year 5 month 12 Japan )
2.3. Target detection process
relatively speaking , At present, it has been fixed
The input image ---> Target potential location filter ---> Candidate box extraction ---> Candidate area classification ---> result
A more vivid example is shown in the figure below :

chart 3: Target identification process
2.4. Target detection classification
At present, target detection is mainly divided into two categories ,two-stage and one-stage, Based on personal project experience ,one-stage It has become the mainstream , Because of the fast ...

chart 4: Target detection classification
For now ,two-stage High precision , Slow speed
one-stage Fast The accuracy is slightly lower , The trend of The Times
3. classic RCNN series
3.1 R-CNN

chart 5: R-CNN
R-CNN Using traditional target recognition practices .
In the first step : Candidate region extraction here ,R-CNN Pick out every candidate area , Then feature extraction is done for each candidate region , Reuse SVM Classifiers do classification .
shortcoming : Almost every picture will produce 2000 Multiple candidate regions , First, the workload is relatively large , But the overlap of these more than 2000 areas is relatively large .
There is a saying and a link : As the first generation R-CNN, Now it seems that there are some shortcomings , it .
3.2. Fast R-CNN

chart 6: Fast R-CNN
The point is fast R-CNN Improvement
From the picture 6 It can also be seen that , differ R-CNN,Fast R-CNN First, check the whole picture , Use CNN Extract feature areas , Then in the relatively small feature map Extract features from .
But this will produce a problem , The size of each candidate region is different , The extracted feature dimensions are different .
Fast R-CNN Made a particularly talented change :
The classifier is replaced by a full connection layer , Like neural networks . Because the number of input nodes in the full connection layer is fixed .
Concrete realization :
Fast R-CNN The of the last convolution layer SSP Layer Change it to Roi Pooling Layer; In addition, a multi task loss function is proposed (Multi-task Loss), Add border regression directly to CNN Training on the Internet , Both candidate region classification loss and location regression loss are included .
Rol Pooling Layer: It's actually SPP Layer A simplified version of the ,SPP Layer Pyramid maps of different sizes are used for each candidate region , namely SPP Layer Multi scale pool layer is used for pool operation ; and Roi Pooling Layer We only need to sample the feature map of different scales to a fixed scale ( for example 7*7). For example, for VGG16 The Internet conv5_3 Yes 512 A feature map , Although the size of the input image is arbitrary , But by Roi Pooling Layer after , Will produce a 7*7*512 The feature vector of the dimension serves as the input of the full connectivity layer , namely Roi Pooling Layer Only a single scale is used for pooling . As shown in the figure below :

chart 7. RoI pooling operation
SPP Layer and the RoI pooling Layer makes the network no longer have restrictions on the size of the input image , meanwhile RoI pooling It's solved SPP Unable to update the weight .RoI pooling Layer has two main functions : first : Put... In the image RoI The region is located to the corresponding position in the convolution feature ; the second : The corresponding convolution feature area is fixed to the feature of a specific length through pooling operation , The feature is then fed into the full connection layer .
3.3 Faster RCNN

chart 8 Faster R-CNN
There's nothing to say ,
RCNN As a relatively early Technology , What we focus on is the framework and structure it has established .
In the future, I will continue to post , Main research anchor-free This direction .( Because it's hot this year )
Reference
1. Cherry Blossom book & Watermelon book
2.https://blog.csdn.net/fengbingchun/article/details/87091740
边栏推荐
- Decision tree (hunt, ID3, C4.5, cart)
- PostgreSQL database replication - background first-class citizen process walreceiver receiving and sending logic
- CoDeSys get system time
- Electrolytic solution for ThinkPad X1 carbon battery
- Legend has it that setting shader attributes with shader ID can improve efficiency:)
- Win10+manjaro dual system installation
- PCB地线设计_单点接地_底线加粗
- Mindmanager22 professional mind mapping tool
- Iris dataset - Introduction to machine learning
- Cartographer learning record: cartographer Map 3D visualization configuration (self recording dataset version)
猜你喜欢

codesys 獲取系統時間

Redis master-slave replication, sentinel, cluster cluster principle + experiment (wait, it will be later, but it will be better)
![[Transformer]On the Integration of Self-Attention and Convolution](/img/64/59f611533ebb0cc130d08c596a8ab2.jpg)
[Transformer]On the Integration of Self-Attention and Convolution

无刷电机调试经验与可靠性设计

新库上线 | CnOpenData不可移动文物数据

华为设备配置跨域虚拟专用网

2022年新高考1卷17题解析

华为设备配置通过GRE接入虚拟专用网

博途仿真时出现“没有针对此地址组态任何硬件,无法进行修改”解决办法

华为设备配置通过GRE隧道接入虚拟专用网
随机推荐
华为设备配置BGP/MPLS IP 虚拟专用网
What is the difference between a wired network card and a wireless network card?
2020-12-24
Leetcode question brushing series - mode 2 (datastructure linked list) - 19:remove nth node from end of list (medium) delete the penultimate node in the linked list
Lr-link Lianrui fully understands the server network card
The third small class discussion on the fundamentals of information and communication
Detailed explanation of network security bypass network card
关于串口波特率的的记录
Lianrui electronics made an appointment with you with SIFA to see two network cards in the industry's leading industrial automation field first
Exness: Liquidity Series - order Block, Unbalanced (II)
[Transformer]MViTv2:Improved Multiscale Vision Transformers for Classification and Detection
Sealem Finance打造Web3去中心化金融平台基础设施
Split all words into single words and delete the product thesaurus suitable for the company
Yolact paper reading and analysis
Let me tell you how to choose a 10 Gigabit network card
New UI learning method subtraction professional version 34235 question bank learning method subtraction professional version applet source code
codesys 獲取系統時間
Writing a good research title: Tips & Things to avoid
华为设备配置跨域虚拟专用网
Use pathlib instead of OS and os Common methods of path