当前位置:网站首页>Convolutional neural network -- from r-cnn, fast r-cnn to fast r-cnn, mask r-cnn
Convolutional neural network -- from r-cnn, fast r-cnn to fast r-cnn, mask r-cnn
2022-07-27 18:02:00 【helpburn】
R-CNN Series of networks are classic networks for target detection and segmentation , Some of the proposed new algorithms can make people excited . Here is a brief introduction to these networks .
Thesis translation :
R-CNN:https://blog.csdn.net/itlilyer/article/details/107190083
Fast R-CNN:https://blog.csdn.net/itlilyer/article/details/107764472
Faster R-CNN:https://blog.csdn.net/itlilyer/article/details/108049850
Mask R-CNN:https://blog.csdn.net/itlilyer/article/details/108441734
R-CNN
The paper
Address of thesis :https://arxiv.org/abs/1311.2524
R-CNN(Regions with CNN features): yes 13 Articles published in 《Rich feature hierarchies for accurate object detection and semantic segmentation》 In the article , author :Ross Girshick Jeff Donahue Trevor Darrell Jitendra Malik.
The main idea
2012 year Krizhevsky Etc. CNN Network plus LeCun Of CNN Skills in the network , send CNN The classification accuracy of has been greatly improved and won ILSVRC The winner of the . Affected by this ,R-CNN The author wants to CNN The network introduces the task of target detection , And the first one to CNN Network is applied to target detection . Core issues How to apply the classification results to the detection task . To solve this problem , The author focuses on two issues :1) How to find the location of the target through the deep network ;2) How to use a small amount of labeled data to train a model with a large number of parameters .
For the first question , The author uses the method of candidate region to solve the problem in CNN The problem of target positioning in , The process is mainly divided into three steps :
1) Extract from each input picture 2000 Category independent candidate areas );
2) And then use CNN Extract fixed length features for each candidate region ;
3) Finally, use specific categories SVM To classify .
The second problem is that the current amount of data is far from enough for training networks with a large number of parameters , The method used by the author is to conduct unsupervised pre training on the model , Then fine tune the specified category . Through experiments, the author proves that this method has a large number of parameters for training in the absence of training data CNN The network is very effective .
The calculation process
Let's start with a screenshot from the paper :
As can be seen from the above figure, the whole process is divided into four steps , Let's talk about :
First step : Enter a picture , There's nothing to say about this .
The second step : Extract candidate areas .
There are many algorithms for generating candidate regions , such as :objectness, selective search,category-independent object proposals, constrained parametric min-cuts (CPMC), multi-scale combinatorial grouping and Cireşan etc. . In order to compare with the previous Detection Algorithm ,R-CNN The author chose Selective Search Method . Use Selective Search Method extract about 2000 Candidate areas .(Selective Search There is time to write a separate article on the algorithm ).
The third step : Extract a fixed size feature map for each candidate region .
Get in 2000 You need to use CNN Extract the feature map for the candidate box .R-CNN Use in AlexNet As a network of feature extraction , because AlexNet The network requires an input size of 227227, But the size of the extracted candidate region is different , Here the author chooses the simplest way —— Regardless of the size of the original candidate region , Zoom directly to 227227 Size , And around the candidate area before zooming pad A circle of pixels of the original picture ( In this paper, pad 了 16 Pixel ).
CNN For each candidate region 4096 The eigenvectors of the dimensions .
Step four : Use SVM To classify .
We use trained SVM The classifier scores each category for each extracted feature vector , Then give all the scoring areas in the picture . Then perform a non maximum suppression for each type .
Advantages and disadvantages
advantage : The detection speed is much faster than the previous detection methods
1) All categories share CNN Parameters , That is to say, there are fewer parameters in the calculation , It will make the calculation faster , The time spent in calculating candidate regions and regional features is allocated to all categories :GPU Up for 13s/image,CPU Up for 53s/image.
2)CNN The dimension of the extracted feature map is small , Only 4096, Than UVA Of detection system 360K A lot smaller .
shortcoming :
1) The training is multi-step :1. The pre training network should be fine-tune;2. Train one for each category SVM classifier ;3. To be used alone Selective Search Method to generate candidate regions ;4. Use regressors Perform border regression on the box .
2) Training time and space consumption : Training SVM and regressor when , You need to save the feature image extracted from the image to the hard disk
3) The execution of the test is also relatively slow , First, generate candidate regions , Then calculate the eigenvalues for each candidate region separately, there will be a lot of repeated calculations .
Fast R-CNN
The paper
Address of thesis :https://arxiv.org/abs/1504.08083
The main idea
Fast R-CNN Is in R-CNN and SPPNet Developed on the basis of . Some new methods are used to speed up the training and testing, while improving the detection accuracy . In the last chapter, I introduced R-CNN The main drawback of ,Fast R-CNN It mainly aims at these shortcomings to improve .
An end-to-end one-stage target detection method is proposed , Replaced the R-CNN A multi-stage approach , In order to realize single-stage detection, the following improvements have been made :
- Put a complete picture and a group Proposal As input , instead of R-CNN Pass through Selective Search The selected candidate area is used as input . First , The network will pass the whole picture CNN Network extraction feature map ; Then through the incoming Proposal Coordinates extract candidate boxes from the calculated feature map . In this way, only one feature extraction will not be like R-CNN In that way, the feature map of each candidate region is repeatedly calculated .
- Added RoI pooling Layer solution Proposal The problem of size inconsistency
- Use multitasking loss Calculation , The whole network is in training loss It is the sum of the two branches of classification and regression , Of course, the coefficient will be brought .
- Online CNN The Internet uses VGG16.
- Use SVD Decomposition accelerated detection
- Use softmax replace SVM
The calculation process
Let's start with a picture in the paper :
As you can see from the diagram
First step : Use Selective Search Select candidate areas .
The second step : Send the picture together with the candidate area information selected in the first step to Fast R-CNN The Internet .
The third step : adopt VGG16 Feature map of image extracted from network
Step four : RoI pooling The layer will map to the candidate region on the feature map through max Pool Operation output is fixed shape Characteristic graph , The specific calculation process is introduced in another article :https://blog.csdn.net/itlilyer/article/details/108666073
Step five : After getting the fixed size feature map, you can give it to the following full connection layer , Finally, there are two branches : Forecast category and border regression .
Advantages and disadvantages
advantage :. Mainly with R-CNN and SPPNet comparison
1) In addition to generating candidate areas, it can realize the end-to-end process connection , Shorter training time , It won't take up hard disk space
2) No need to train alone SVM and regressor.
3) Share weight , There is no need to repeatedly calculate the feature map of each candidate region .
** shortcoming : **
1) Generate candidate areas that are still used Selective Search, It will still consume a lot of time , It is called the bottleneck of the whole algorithm
Faster R-CNN
The paper
Address of thesis : https://arxiv.org/abs/1506.01497
The main idea
Fast R-CNN and SPPNet Exposing the generation candidate region has become the bottleneck of the current target detection network . In response to this question , The author puts forward his own solution —— Introduced RPN(Region Proposal Network) The Internet .
- introduce RPN Network substitution Selective Search, Solve the performance bottleneck of generating candidate regions .RPN You can refer to :https://blog.csdn.net/itlilyer/article/details/109818142
- RPN and Fast R-CNN Network training together , share CNN The Internet , take RPN and Fast R-CNN Really merge into one network .
The calculation process
As usual , Upper figure :
First step : Use CNN The Internet ( for example VGG16) To extract the whole picture feature map
The second step : Transfer the feature map into RPN The Internet , Generate RoI
The third step : Will be the first step of Feature map And step two RoI Pass it on to the back RoI pooling
Step four : take RoI Pooling The result is passed to the full connection layer
Step five : The results of the whole connection layer are given to bbox Border regression branch and softmax Branches of classification
Advantages and disadvantages
advantage :
1) Of course, the first advantage is that the performance is faster than the previous network ( All networks are like this :)), It can meet the standard of real-time detection
2)FPN and Fast R-CNN Shared convolution layer , You can train end-to-end ,
shortcoming :
- after RoI pooling And then there will be misalignment The problem of , But it has little effect on the prediction frame , It has a great impact on semantic segmentation ; meanwhile RoI Pooling There is one bug, Is to join RoI Size ratio RoI Pool When the output of is small, it will be directly ignored .
Mask R-CNN
The paper
Address of thesis :https://arxiv.org/abs/1703.06870
The main idea
maskrcnn The main idea of is faster rcnn Add an instance split branch on the basis . What improvements have been made ?
1) use RoIAlign Instead of RoIPool, solve RoIpool Of misalignment The problem of , It will affect the accuracy of segmentation
The calculation process
practice , Take a picture :

According to the difference of extracting picture feature map backbone And for each RoI Implement box prediction and mask forecast ( The second picture ) Divided into a variety of architectures , Here we have Resnet-FPN by backbone Introduce .
First step : Use resnet50 Feature map of image extracted from network
The second step : The characteristic diagram passes through FPN The Internet , Output multi-scale feature pyramids , As RPN The input of
The third step : RPN Use feature pyramids to generate RoI
Step four : According to the two above head The structures are identified separately ( Classification and regression ) and mask forecast
Advantages and disadvantages
advantage
- Use RoIAlign Instead of RoIpool, Improves the accuracy of instance segmentation , It also solves the problems mentioned above bug
边栏推荐
- 运行loam_velodyne实时建图
- Can oracle-linux-7.9 support oracle-19c ACFs file system?
- [introduction to database system (Wang Shan)] Chapter 1 - Introduction
- An analysis of CPU explosion of a smart logistics WCS system in.Net
- 树莓派驱动代码的编译和测试
- 面试好难啊!蚂蚁金服的六轮面试我是强撑过来!差点OUT(面试复盘)
- 知物由学 | 易盾移动端同构实践,几步改善官网交互体验
- 灵魂一问:为什么ES比MySQL更适合复杂条件搜索?
- likeshop外卖点餐系统「100%开源无加密」
- [MCU] 2.2 pin function of AT89S52
猜你喜欢

卷积神经网络之卷积计算过程个人理解

Cow! His secret is to reproduce the paper in 2 hours——
What are the safety risks of small games?

MySQL view and stored procedure

Database hyperphone (I)

Are those who are absent from the written examination shortlisted for the teacher recruitment interview? Henan Xiangfu: the statistics of individual candidates' scores are wrong

一文理解分布式开发中的服务治理

Wechat applet cloud function batch delete multiple data error: errcode: -502005 database collection not exists

Coca Cola's primary challenge is not vitality forest

Kubernetes 1.24 high availability cluster binary deployment
随机推荐
An end-to-end file upload program based on form, including client and server
Taishan Office Technology Lecture: word strange paragraph borders
6月第1周易盾业务风控关注 | 新东方学而思等15家机构被顶格罚款
Likeshop takeout ordering system "100% open source without encryption"
细数国产接口协作平台的六把武器!
JSP自定义标签(下)
Count the six weapons of the domestic interface cooperation platform!
机器学习——概念理解之IoU
Establishing SSL connection without server‘s identity verification is not recommended
In the first week of June, risk control of e-shield business paid attention to 15 institutions such as New Oriental XRS, which were fined
Today's sleep quality record 82 points
How to develop an online Excel spreadsheet system (Part 1)
#yyds干货盘点# 面试必刷TOP101:链表内指定区间反转
Oracle 11g数据库安装教程
【obs】NewSocketLoopEnable 网络优化
Layout of flutter
Understanding service governance in distributed development
$attrs and $listeners components transfer values
Tencent cloud upload
likeshop外卖点餐系统「100%开源无加密」