当前位置:网站首页>OD-Paper【1】:Rich feature hierarchies for accurate object detection and semantic segmentation
OD-Paper【1】:Rich feature hierarchies for accurate object detection and semantic segmentation
2022-07-26 05:20:00 【zzzyzh】
List of articles
Preface
RCNN(Regions with CNN features) Yes, it will CNN This is a milestone in the application of the method to target detection , With the help of CNN Good feature extraction and classification performance , adopt RegionProposal Method to realize the transformation of target detection problem .
This paper mainly interprets the paper , And explain about RCNN Part of the frame
1. Abstract & Introduction
1.1. Abstract
The method of this paper combines two key factors :
- Use a high-capacity convolutional neural network from bottom to top on the candidate region (CNNs)( Network capacity : The ability of artificial neural network to shape complex functions ), Used to locate and segment objects .
- When the labeled training data is insufficient , First, carry out supervised pre training for auxiliary tasks, and the good model , Plus fine tuning on a specific domain , Can produce significant performance improvement .
In this paper, candidate regions (region proposals) And CNNs combination , Therefore, this method is called R-CNN(Regions with CNN features): have CNN Candidate areas for features .
1.2. Introduction
This paper focuses on two issues : Use deep network to locate targets and train high-capacity network models on small-scale labeled data sets .
Different from image classification , Detection requires locating many objects in an image . One way is to think of location as a regression problem . Another alternative is to build a sliding window detector , Use... In this way CNNs At least there are 20 It's been years . Usually used for some limited object categories , Like face , Pedestrians, etc . In order to maintain high spatial resolution , these CNNs Only two convolution layers and two pooling layers are used .
contrary , This paper is through the use of “ Use the identification of candidate areas ” Methods , It's solved CNN The problem of positioning ( This has achieved success in target detection and semantic segmentation ). When testing , The method of this paper is input image , Generated close 2000 Category independent candidate areas , utilize CNN Extract a fixed length feature vector from each candidate region , then With the help of linearity specific to specific categories SVM Classify each candidate area . We don't consider the size of the area , The method of radiograph deformation is used to generate a fixed length image for each candidate region CNN Input eigenvector ( That is to put candidate areas of different sizes into the same size ). The following figure shows an overview of our method and highlights some experimental results . Because our system combines candidate regions and CNNs, So it's called R-CNN(regions with CNN features): have CNN Candidate areas for features .
2. Object detection with R-CNN
2.1. Algorithm process
RCNN The algorithm flow can be divided into 4 A step :
- One image generation 1000~2000 Candidate areas ( Use Selective Search Method )
- For each candidate area , Use deep networks to extract features
- Features are fed into each type of SVM classifier , Judge whether it belongs to this category
- Use regression to fine tune candidate box position
2.2. Module design

2.2.1. Region proposals
utilize Selective Search The algorithm obtains some original regions by image segmentation , Then use some merge strategies to merge these regions , Get a hierarchical regional structure , And these structures contain objects that might be needed .
2.2.2. Feature extraction
For each candidate area , Use deep networks to extract features . take 2000 Zoom candidate areas to 227 x 227 pixel , Then input the candidate area into the pre trained AlexNet CNN Internet access 4096 The feature of dimension is obtained 2000 x 4096 dimension matrix .
there CNN Image classification network , But remove the last full connection layer , Get the output 4096 Dimension vector .
2.2.3. SVM classifier
take 2000 x 4096 dimension Characteristics and 20 individual SVM The weight matrix 4096 x 20 Multiply , get 2000 x 20 dimension The matrix indicates that each suggestion box is the score of a certain goal . Respectively for the above 2000 x 20 dimension Each column in the matrix, that is, each class Non maximum suppression Eliminate overlapping suggestion boxes , Get this column, that is, some suggestion boxes with the highest scores in this category .

Be careful : To facilitate observation , The vector in the picture is transposed
It can be understood as , Each column vector represents the probability that the category box in the picture is the class corresponding to the column

- The matrix on the left : Each line is where each feature box passes CNN The feature vector obtained after classifying the network
- The matrix in the middle : namely SVM A weight matrix , Each column corresponds to the weight vector of a category
- The matrix on the right : Suppose the first column of the matrix in the middle represents a cat , The second column represents dogs , Then the first row of the matrix on the left is multiplied by the first column of the intermediate matrix , Get the elements of the first row and the first column of the matrix on the right , That is, the probability that the first candidate box is a cat
- For each column of the matrix on the right ( That is, each type ), Carry out non maximum suppression , To eliminate some overlapping suggestion boxes
- For example, the first column of the matrix on the right represents the probability that all candidate boxes are cats
2.2.3.1. Non maximum suppression
IoU (Intersection over Union)
- The mathematical representation is : ( A ∩ B ) / ( A ∪ B ) (A\cap B) / (A\cup B) (A∩B)/(A∪B)

- For the use of each category
- Looking for the highest scoring target
- Calculate the relationship between other goals and that goal IoU value
- Delete all IoU Targets with values greater than a given threshold
- The deleted targets here refer to those parts that overlap with the highest score much ,IoU Although big , But the score is not the highest , One with the highest score is enough
- The purpose is to eliminate overlapping frames ,IoU The larger the value, the more overlapping areas

2.2.4. Regressor correction
Use regression to fine tune candidate box position , Yes NMS (Non-Maximum Suppression) After processing, the remaining suggestion boxes are further filtered . Then separately use 20 A regressor for the above 20 The suggestion boxes belonging to the category are regressed , Finally get the highest score of each category after correction bounding box.
After regression classifier , You'll get 4 Parameters : The center point of the goal suggestion box x Offset 、y Offset , Scaling factor of bounding box Height 、 Width scaling factor

Pictured , Yellow frame P P P A suggestion box Region Proposal, Green window G G G Represents the actual box Ground Truth, The red window G ^ \hat{G} G^ Express Region Proposal The prediction window after regression , Linear regression problems that can be solved by the least square method .
3. R-CNN The problem is
- The test speed is slow
Test a picture about 53s(CPU). use Selective Search The algorithm takes about 2s, There is a lot of overlap between candidate frames in an image , Feature extraction operation redundancy . - Training is slow
The process is extremely cumbersome - Training needs a lot of space
about SVM and bbox Back to training , We need to extract features from each target candidate box in each image , And write to disk . For very deep networks , Such as VGG16, from VOC07 From the training set 5k The features extracted from the image need hundreds of G Storage space
summary
边栏推荐
- nacos注册中心
- LeetCode链表问题——206.反转链表(一题一文学会链表)
- pillow的原因ImportError: cannot import name ‘PILLOW_VERSION‘ from ‘PIL‘,如何安装pillow<7.0.0
- MySQL basic learning
- [Luogu] p3919 [template] persistent segment tree 1 (persistent array)
- Embedded development notes, practical knowledge sharing
- C language force buckle question 42 of rain. Four methods - violence, dynamic planning, stack, double pointer
- Meta analysis [whole process, uncertainty analysis] method based on R language and meta machine learning
- ALV程序收集
- SAP report development steps
猜你喜欢

Princeton calculus reader 02 Chapter 1 -- composition of functions, odd and even functions, function images

When AQS wakes up the thread, I understand why it traverses from the back to the front

Nacos 介绍和部署

测试必备工具之Fiddler,你真的了解吗?

Mysql优化

OD-Paper【2】:Fast R-CNN

DOM event flow event bubble event capture event delegate

security权限管理详解

Compilation method of flood control evaluation report and flood modeling under the new guidelines

TZC 1283: simple sorting - function method
随机推荐
No background, no education? Is it really hopeless for specialist testers to enter Internet factories?
JVM Lecture 6: how to solve the frequent FGC in online environment?
flex布局原理及常见的父项元素
If MySQL calculates the current month change / current month increase / year-on-year change / year-on-year increase?
ABAP语法学习(ALV)
Basic methods of realizing licensing function in C language
ALV report flow diagram
C语言力扣第41题之缺失的第一个正数。两种方法,预处理快排与原地哈希
Uniapp applet framework - a set of code, multi segment coverage
Application of remote sensing, GIS and GPS technology in hydrology, meteorology, disasters, ecology, environment and health
Chinese character style transfer --- learn the conversion and generation of one to many programmed Chinese characters through generation confrontation network
Getaverse,走向Web3的远方桥梁
Week 6 Learning Representation: Word Embedding (symbolic →numeric)
普林斯顿微积分读本02第一章--函数的复合、奇偶函数、函数图像
Reason for pilot importerror: cannot import name 'pilot_ Version 'from' PIL ', how to install pilot < 7.0.0
mysql如果计算本月变动/本月增幅/同比变动/同比增幅?
LeetCode链表问题——203.移除链表元素(一题一文学会链表)
Go exceed API source code reading (VI) -- deletesheet (sheet string)
Meta analysis [whole process, uncertainty analysis] method based on R language and meta machine learning
Recommend 12 academic websites for free literature search, and suggest to like and collect!