当前位置:网站首页>Interpretation of mask RCNN paper
Interpretation of mask RCNN paper
2022-07-05 01:36:00 【Xiaobai learns vision】
Mask R-CNN Introduce
Mask R-CNN Is based on Faster R-CNN Based on staged improvements ,FasterR-CNN Not designed for pixel alignment between input and output , In order to make up for this deficiency , We propose a concise non quantized layer , named RoIAlign,RoIAlign You can keep an approximate spatial location , In addition to this improvement ,RoIAlign There is also a major impact : That is, it can be relatively improved 10% To 50% Mask accuracy (Mask Accuracy), This improvement can get better measurement results under more strict positioning measurement indicators . second , We find that segmentation mask and category prediction are very important : So , We predict a binary mask for each category . Based on the above improvements , Our final model Mask R-CNN I've outperformed all the previous COCO Single model of instance segmentation task , This model can be used in GPU On the frame of 200ms Speed of operation , stay COCO Of 8-GPU Training on the machine requires 1 To 2 Time of day .
MaskR-CNN Have simple and clear ideas : about FasterR-CNN Come on , For each target object , It has two outputs , One is the class tag (classlabel), One is the offset value of the bounding box (bounding-box offset), On this basis ,Mask R-CNN Method adds the output of the third branch : Destination mask . The destination mask is different from the existing one class and box The difference in output is that it requires a more refined extraction of the spatial layout of the target . Next , Let's introduce in detail Mask R-CNN The main elements of , Include Fast/Faster R-CNN Missing pixel alignment (pixel-topixel alignment).
Mask R-CNN How it works
Mask R-CNN Used with Faster R-CNN An interlinked two-stage process , The first stage is called RPN(Region Proposal Network), This step proposes the candidate object bounding box . The second stage is essentially FastR-CNN, It uses... From candidate frameworks RoIPool To extract features and carry out classification and bounding box regression , but Mask R-CNN Further, for each RoI Generated a binary mask , We recommend readers to read further Huang(2016) And so on “Speed/accuracy trade-offs for modern convolutional object detectors” Detailed comparison of papers Faster R-CNN Different from other frameworks .
The mask encodes the spatial layout of an object , Unlike class tags or frameworks ,Mast R-CNN The spatial structure can be extracted using a mask by convoluted pixel alignment .
ROIAlign:ROIPool From every ROI Extract feature map from ( for example 7*7) Standard operation of .
Network architecture (Network Architecture): In order to prove Mast R-CNN Universality of , We will Mask R-CNN Multiple architectural instantiations of , To distinguish between different architectures , The main architecture of convolution is shown in this paper (backbone architecture), The architecture is used to extract the features of the whole picture ; Header architecture (headarchitecture), For border recognition ( Classification and regression ) And each RoI Mask prediction .
stay Faster R-CNN Modifications on the network , Specific include :
(1) take ROI Pooling Layer replaced with ROIAlign;
(2) Added juxtaposed FCN layer (Mask layer ).
Technical points
One 、 Enhanced infrastructure
take ResNeXt-101+FPN Used as a feature extraction network , achieve State-of-the-art The effect of .
Two 、 Joined the ROIAlign layer
ROIPool It's for every ROI Extract a small-scale feature map (E.g. 7x7) Standard operation of , It is used to solve problems of different scales ROI The problem of extracting the feature size into the same scale .ROIPool First of all, the floating-point numerical value ROI Quantized into a characteristic diagram of discrete particles , Then quantify ROI A small piece divided into several spaces (Spatial Bins), Finally, each small piece is Max Pooling The operation produces the final result .
By calculation [x/16] In continuous coordinates x Quantify on , among 16 Is the step size of the characteristic graph ,[ . ] It means round off . These quantifications introduce ROI Misalignment with the extracted features . Because the classification problem is robust to the translation problem , So the impact is relatively small . However, this will have a very large negative impact when predicting the mask with pixel level accuracy .
thus , The author puts forward ROIAlign Layer to solve this problem , And align the extracted features with the input . It's easy , Avoid being right ROI A boundary or block of (Bins) Do any quantification , For example, direct use x/16 Instead of [x/16]. The author uses bilinear interpolation (Bilinear Interpolation) At every ROI In block 4 Calculate the exact value of the input feature at a sampling location , And aggregate the results ( Use Max perhaps Average).
Use an example to analyze the mismatch of the above regions . As shown in the figure , This is a Faster-RCNN Detection framework . Enter a 800*800 Pictures of the , There is a... In the picture 665*665 The bounding box ( Framed by a dog ). After extracting the features of the image through the backbone network , Characteristic graph scaling step size (stride) by 32. therefore , The edge length of the image and bounding box is the same as that of the input 1/32.800 It just happens to be 32 Divide into 25. but 665 Divide 32 Get it later 20.78, With decimal , therefore ROI Pooling Directly quantify it into 20. Next, you need to pool the features in the box 7*7 Size , Therefore, the bounding box is evenly divided into 7*7 A rectangular area . obviously , The side length of each rectangular area is 2.86, It also contains decimals . therefore ROI Pooling Quantify it again to 2. After these two quantifications , Obvious deviation has occurred in the candidate region ( As shown in the green part of the figure ). what's more , On the characteristic map of this layer 0.1 One pixel deviation , Zoom to the original image is 3.2 Pixel . that 0.8 The deviation of , It's close on the original picture 30 The difference between pixels , The impact is still great .
Specific methods and key points :
- Traverse every candidate area , Keep floating-point boundaries and not quantify .
- Divide the candidate region into k x k A unit , The boundary of each element is not quantified .
- Four fixed coordinate positions are calculated in each cell , The values of these four positions are calculated by bilinear interpolation , And then maximize pooling .
3、 ... and 、 Improved segmentation Loss
From the original single pixel based Softmax The polynomial cross entropy becomes based on single pixel Sigmod Binary cross entropy . The framework predicts a binary mask for each category independently , No inter class competition is introduced , The category of each binary mask depends on the network ROI The classification prediction results given by the classification Branch . This is related to FCNs Different ,FCNs It is a multi category classification of each pixel , It classifies and segments at the same time , The experimental results show that this method can get a poor performance for object instance segmentation .
Here are more details , In the training phase , The author for each sample ROI Define a multitasking loss function L=L_{cls}+L_{box}+L_{mask}, The first two items don't introduce much . Mask branches for each ROI There will be one. Km^2 The output of dimensions , It encodes K The resolution is m\times m Binary mask for , Corresponding to K Categories . Therefore, the author makes use of A Per-pixelSigmoid, And defined as the average binary cross entropy loss (The Average Binary Cross-entropy Loss). For one that belongs to the K Category ROI, Consider only the second K individual Mask( Other mask inputs do not contribute to the loss function ). Such a definition will allow masks to be generated for each category , And there will be no competition between classes .
Four 、 The mask represents
A mask encodes the spatial layout of an input object . The author uses a FCN Come to each ROI Predict a mask , This preserves the spatial structure information .
边栏推荐
- Yyds dry goods inventory [Gan Di's one week summary: the most complete and detailed in the whole network]; detailed explanation of MySQL index data structure and index optimization; remember collectio
- [FPGA tutorial case 9] design and implementation of clock manager based on vivado core
- A simple SSO unified login design
- WCF: expose unset read-only DataMember property- WCF: Exposing readonly DataMember properties without set?
- Basic operations of database and table ----- create index
- Database postragesq BSD authentication
- Wechat applet: the latest WordPress black gold wallpaper wechat applet two open repair version source code download support traffic main revenue
- Do you know the eight signs of a team becoming agile?
- Database postragesql client authentication
- Research Report on the overall scale, major producers, major regions, products and application segmentation of agricultural automatic steering system in the global market in 2022
猜你喜欢
Logstash、Fluentd、Fluent Bit、Vector? How to choose the appropriate open source log collector
Win:使用 Shadow Mode 查看远程用户的桌面会话
【大型电商项目开发】性能压测-性能监控-堆内存与垃圾回收-39
Application and Optimization Practice of redis in vivo push platform
The perfect car for successful people: BMW X7! Superior performance, excellent comfort and safety
JS implementation determines whether the point is within the polygon range
【CTF】AWDP总结(Web)
MATLB|多微电网及分布式能源交易
微信小程序:微群人脉微信小程序源码下载全新社群系统优化版支持代理会员系统功能超高收益
Mysql database | build master-slave instances of mysql-8.0 or above based on docker
随机推荐
MySQL backup and recovery + experiment
[FPGA tutorial case 10] design and implementation of complex multiplier based on Verilog
Expansion operator: the family is so separated
Basic operations of database and table ----- delete index
220213c language learning diary
After reading the average code written by Microsoft God, I realized that I was still too young
[wave modeling 1] theoretical analysis and MATLAB simulation of wave modeling
Huawei machine test question: longest continuous subsequence
【LeetCode】88. Merge two ordered arrays
整理混乱的头文件,我用include what you use
Change the background color of a pop-up dialog
Basic operation of database and table ----- the concept of index
Yyds dry goods inventory kubernetes management business configuration methods? (08)
流批一体在京东的探索与实践
142. Circular linked list II
Robley's global and Chinese markets 2022-2028: technology, participants, trends, market size and share Research Report
MATLB|多微电网及分布式能源交易
MySQL regexp: Regular Expression Query
Redis master-slave replication cluster and recovery ideas for abnormal data loss # yyds dry goods inventory #
Global and Chinese markets of emergency rescue vessels (errv) 2022-2028: Research Report on technology, participants, trends, market size and share