当前位置:网站首页>Convolutional neural network -- from r-cnn, fast r-cnn to fast r-cnn, mask r-cnn
Convolutional neural network -- from r-cnn, fast r-cnn to fast r-cnn, mask r-cnn
2022-07-27 18:02:00 【helpburn】
R-CNN Series of networks are classic networks for target detection and segmentation , Some of the proposed new algorithms can make people excited . Here is a brief introduction to these networks .
Thesis translation :
R-CNN:https://blog.csdn.net/itlilyer/article/details/107190083
Fast R-CNN:https://blog.csdn.net/itlilyer/article/details/107764472
Faster R-CNN:https://blog.csdn.net/itlilyer/article/details/108049850
Mask R-CNN:https://blog.csdn.net/itlilyer/article/details/108441734
R-CNN
The paper
Address of thesis :https://arxiv.org/abs/1311.2524
R-CNN(Regions with CNN features): yes 13 Articles published in 《Rich feature hierarchies for accurate object detection and semantic segmentation》 In the article , author :Ross Girshick Jeff Donahue Trevor Darrell Jitendra Malik.
The main idea
2012 year Krizhevsky Etc. CNN Network plus LeCun Of CNN Skills in the network , send CNN The classification accuracy of has been greatly improved and won ILSVRC The winner of the . Affected by this ,R-CNN The author wants to CNN The network introduces the task of target detection , And the first one to CNN Network is applied to target detection . Core issues How to apply the classification results to the detection task . To solve this problem , The author focuses on two issues :1) How to find the location of the target through the deep network ;2) How to use a small amount of labeled data to train a model with a large number of parameters .
For the first question , The author uses the method of candidate region to solve the problem in CNN The problem of target positioning in , The process is mainly divided into three steps :
1) Extract from each input picture 2000 Category independent candidate areas );
2) And then use CNN Extract fixed length features for each candidate region ;
3) Finally, use specific categories SVM To classify .
The second problem is that the current amount of data is far from enough for training networks with a large number of parameters , The method used by the author is to conduct unsupervised pre training on the model , Then fine tune the specified category . Through experiments, the author proves that this method has a large number of parameters for training in the absence of training data CNN The network is very effective .
The calculation process
Let's start with a screenshot from the paper :
As can be seen from the above figure, the whole process is divided into four steps , Let's talk about :
First step : Enter a picture , There's nothing to say about this .
The second step : Extract candidate areas .
There are many algorithms for generating candidate regions , such as :objectness, selective search,category-independent object proposals, constrained parametric min-cuts (CPMC), multi-scale combinatorial grouping and Cireşan etc. . In order to compare with the previous Detection Algorithm ,R-CNN The author chose Selective Search Method . Use Selective Search Method extract about 2000 Candidate areas .(Selective Search There is time to write a separate article on the algorithm ).
The third step : Extract a fixed size feature map for each candidate region .
Get in 2000 You need to use CNN Extract the feature map for the candidate box .R-CNN Use in AlexNet As a network of feature extraction , because AlexNet The network requires an input size of 227227, But the size of the extracted candidate region is different , Here the author chooses the simplest way —— Regardless of the size of the original candidate region , Zoom directly to 227227 Size , And around the candidate area before zooming pad A circle of pixels of the original picture ( In this paper, pad 了 16 Pixel ).
CNN For each candidate region 4096 The eigenvectors of the dimensions .
Step four : Use SVM To classify .
We use trained SVM The classifier scores each category for each extracted feature vector , Then give all the scoring areas in the picture . Then perform a non maximum suppression for each type .
Advantages and disadvantages
advantage : The detection speed is much faster than the previous detection methods
1) All categories share CNN Parameters , That is to say, there are fewer parameters in the calculation , It will make the calculation faster , The time spent in calculating candidate regions and regional features is allocated to all categories :GPU Up for 13s/image,CPU Up for 53s/image.
2)CNN The dimension of the extracted feature map is small , Only 4096, Than UVA Of detection system 360K A lot smaller .
shortcoming :
1) The training is multi-step :1. The pre training network should be fine-tune;2. Train one for each category SVM classifier ;3. To be used alone Selective Search Method to generate candidate regions ;4. Use regressors Perform border regression on the box .
2) Training time and space consumption : Training SVM and regressor when , You need to save the feature image extracted from the image to the hard disk
3) The execution of the test is also relatively slow , First, generate candidate regions , Then calculate the eigenvalues for each candidate region separately, there will be a lot of repeated calculations .
Fast R-CNN
The paper
Address of thesis :https://arxiv.org/abs/1504.08083
The main idea
Fast R-CNN Is in R-CNN and SPPNet Developed on the basis of . Some new methods are used to speed up the training and testing, while improving the detection accuracy . In the last chapter, I introduced R-CNN The main drawback of ,Fast R-CNN It mainly aims at these shortcomings to improve .
An end-to-end one-stage target detection method is proposed , Replaced the R-CNN A multi-stage approach , In order to realize single-stage detection, the following improvements have been made :
- Put a complete picture and a group Proposal As input , instead of R-CNN Pass through Selective Search The selected candidate area is used as input . First , The network will pass the whole picture CNN Network extraction feature map ; Then through the incoming Proposal Coordinates extract candidate boxes from the calculated feature map . In this way, only one feature extraction will not be like R-CNN In that way, the feature map of each candidate region is repeatedly calculated .
- Added RoI pooling Layer solution Proposal The problem of size inconsistency
- Use multitasking loss Calculation , The whole network is in training loss It is the sum of the two branches of classification and regression , Of course, the coefficient will be brought .
- Online CNN The Internet uses VGG16.
- Use SVD Decomposition accelerated detection
- Use softmax replace SVM
The calculation process
Let's start with a picture in the paper :
As you can see from the diagram
First step : Use Selective Search Select candidate areas .
The second step : Send the picture together with the candidate area information selected in the first step to Fast R-CNN The Internet .
The third step : adopt VGG16 Feature map of image extracted from network
Step four : RoI pooling The layer will map to the candidate region on the feature map through max Pool Operation output is fixed shape Characteristic graph , The specific calculation process is introduced in another article :https://blog.csdn.net/itlilyer/article/details/108666073
Step five : After getting the fixed size feature map, you can give it to the following full connection layer , Finally, there are two branches : Forecast category and border regression .
Advantages and disadvantages
advantage :. Mainly with R-CNN and SPPNet comparison
1) In addition to generating candidate areas, it can realize the end-to-end process connection , Shorter training time , It won't take up hard disk space
2) No need to train alone SVM and regressor.
3) Share weight , There is no need to repeatedly calculate the feature map of each candidate region .
** shortcoming : **
1) Generate candidate areas that are still used Selective Search, It will still consume a lot of time , It is called the bottleneck of the whole algorithm
Faster R-CNN
The paper
Address of thesis : https://arxiv.org/abs/1506.01497
The main idea
Fast R-CNN and SPPNet Exposing the generation candidate region has become the bottleneck of the current target detection network . In response to this question , The author puts forward his own solution —— Introduced RPN(Region Proposal Network) The Internet .
- introduce RPN Network substitution Selective Search, Solve the performance bottleneck of generating candidate regions .RPN You can refer to :https://blog.csdn.net/itlilyer/article/details/109818142
- RPN and Fast R-CNN Network training together , share CNN The Internet , take RPN and Fast R-CNN Really merge into one network .
The calculation process
As usual , Upper figure :
First step : Use CNN The Internet ( for example VGG16) To extract the whole picture feature map
The second step : Transfer the feature map into RPN The Internet , Generate RoI
The third step : Will be the first step of Feature map And step two RoI Pass it on to the back RoI pooling
Step four : take RoI Pooling The result is passed to the full connection layer
Step five : The results of the whole connection layer are given to bbox Border regression branch and softmax Branches of classification
Advantages and disadvantages
advantage :
1) Of course, the first advantage is that the performance is faster than the previous network ( All networks are like this :)), It can meet the standard of real-time detection
2)FPN and Fast R-CNN Shared convolution layer , You can train end-to-end ,
shortcoming :
- after RoI pooling And then there will be misalignment The problem of , But it has little effect on the prediction frame , It has a great impact on semantic segmentation ; meanwhile RoI Pooling There is one bug, Is to join RoI Size ratio RoI Pool When the output of is small, it will be directly ignored .
Mask R-CNN
The paper
Address of thesis :https://arxiv.org/abs/1703.06870
The main idea
maskrcnn The main idea of is faster rcnn Add an instance split branch on the basis . What improvements have been made ?
1) use RoIAlign Instead of RoIPool, solve RoIpool Of misalignment The problem of , It will affect the accuracy of segmentation
The calculation process
practice , Take a picture :

According to the difference of extracting picture feature map backbone And for each RoI Implement box prediction and mask forecast ( The second picture ) Divided into a variety of architectures , Here we have Resnet-FPN by backbone Introduce .
First step : Use resnet50 Feature map of image extracted from network
The second step : The characteristic diagram passes through FPN The Internet , Output multi-scale feature pyramids , As RPN The input of
The third step : RPN Use feature pyramids to generate RoI
Step four : According to the two above head The structures are identified separately ( Classification and regression ) and mask forecast
Advantages and disadvantages
advantage
- Use RoIAlign Instead of RoIpool, Improves the accuracy of instance segmentation , It also solves the problems mentioned above bug
边栏推荐
- [OBS] newsocketloopenable network optimization
- 工信部再治数据安全,网易易盾“隐私合规”守住企业经营底线
- Behind every piece of information you collect, you can't live without TA
- Helm install kubevela complete makefile script content
- JS to realize the right-click menu bar function
- Could not obtain transaction-synchronized Session for current thread
- 运行loam_velodyne实时建图
- numpy数组矩阵操作
- 7月第4周易盾业务风控关注 | 最高法对APP强索个人信息进行规制
- 【Codeforces】 A. Computer Game
猜你喜欢

Wechat applet cloud function batch delete multiple data error: errcode: -502005 database collection not exists

Oracle 11g database installation tutorial

卷积神经网络——YOLOV2(YOLO9000)论文翻译

知物由学 | APP大瘦身,新一代AAB框架下的安全加固之道

微信小程序 实现位置地图显示,引入map地图,不含导航

面试官:什么是脚手架?为什么需要脚手架?常用的脚手架有哪些?

【单片机】2.3 AT89S52的CPU

Establishing SSL connection without server‘s identity verification is not recommended

如何限制root远程登入,使普通用户拥有root权限
知物由学 | SO加固如何提升Android应用的安全性?
随机推荐
知物由学 | APP大瘦身,新一代AAB框架下的安全加固之道
卷积神经网络——YOLOV1论文翻译
Coca Cola's primary challenge is not vitality forest
Introduction to cue language foundation: cue is a language born for configuration
用slmgr命令激活正版Win7旗舰版系统
wallys/DR882-Qualcomm-Atheros-QCA9882-2T2R-MIMO-802.11ac-Mini-PCIe-Wi-Fi-Module-5G-high-power.
Switch and router technology-03-basic configuration of switch
numpy数组矩阵操作
How to learn C language? This article gives you the complete answer
运行loam_velodyne实时建图
面试官:什么是脚手架?为什么需要脚手架?常用的脚手架有哪些?
How to improve the security of Android applications?
6月第1周易盾业务风控关注 | 新东方学而思等15家机构被顶格罚款
Windows and network foundation-15-local security policy
X-sheet development tutorial: initialization configuration custom layout
Kubernetes 1.24 high availability cluster binary deployment
知物由学 | 易盾移动端同构实践,几步改善官网交互体验
树莓派驱动代码的编译和测试
【cf】#681 A. Kids Seating (Div. 2, based on VK Cup 2019-2020 - Final)
知物由学 | 易盾自研文本实时聚类技术,一网打尽社交网络中的同类有害内容