当前位置:网站首页>【MagNet】《Progressive Semantic Segmentation》
【MagNet】《Progressive Semantic Segmentation》
2022-07-02 07:48:00 【bryant_ meng】

CVPR-2021
List of articles
1 Background and Motivation
When doing high-resolution image segmentation tasks , because GPU Resource constraints , You can't train the original picture directly
The solution is often downsample the big image or divide the image into local patches for separate processing
However downsample Many details will be lost ,patches The method lacks a holistic view ( Global information )

The author combines the advantages of the above two methods , Put forward a multi-scale segmentation framework for high-resolution images——MagNet
stay Cityscapes / DeepGlobe / Gleason Its effectiveness is verified on three high-resolution image datasets
2 Related Work
Multi-scale, eg:FPN / ASPP / HRNet
multi-stage, eg:Auto-ZoomNet
context aggregation,eg:BiseNet
Segmentation refinement

3 Advantages / Contributions
Aiming at the problem of high-resolution image segmentation , Design MagNet The Internet ,Experiments on three high-resolution datasets of urban views, aerial scenes, and medical images show that MagNet consistently outperforms the state-of-theart methods by a significant margin
4 Method
There are two core modules
segmentation network(module, Ordinary partition network )
refinement module( Proposed by the author )
4.1 Multistage processing pipeline

- s Express scale
- p Express patch
- X Represents the input picture
- Y Indicates the output picture
- X ˉ \bar{X} Xˉ Indicates input to segmentation network Medium tensor, Fixed size
- Y ˉ \bar{Y} Yˉ From refinement module The output of the tensor, Fixed size
- O ˉ \bar{O} Oˉ From segmentation network The output of the tensor, Fixed size
With 4 scale As an example
If you input a picture h and w by 1024x2048
each scale Under the patch The size is :
1024x2048
512x1024
256x512
128x256
segmentation and refinement The input and output of the module are 128x256
4.2 Refinement module

1)refinement module There are two inputs to
- the cumulative result from the previous stages, Y ˉ \bar{Y} Yˉ
- the result obtained by running the segmentation module at and only at the current scale, O ˉ \bar{O} Oˉ
2)refinement network The structure is as follows

O+Y=R
3) history scale Results and current scale result set

Let Y u Y_u Yu and R u R_u Ru denote the prediction uncertainty maps for Y Y Y and R R R respectively.
4)uncertainty maps For the definition of
for each pixel of Y , the prediction confidence at this location is defined as the absolute difference between the highest probability value and the second-highest value (among the C probability values for C classes).
5) Use two prediction uncertainty maps Choose Y Y Y ( Cumulative segmentation graph ) Of k k k Refine at four locations .

- k k k It means Y Y Y The inaccuracy of prediction , and R R R Where the prediction is more accurate
- ⨀ \bigodot ⨀ yes element-wise multiplication
- F F F Means median filtering , To smooth out the score map
- 1 − R 1-R 1−R Equivalent to attention mechanism , Used to correct Y Y Y Weighted
5) Y u Y_u Yu and R u R_u Ru The combination mode of is 
among F denotes median blurring to smooth the score map( median filtering )
⨀ \bigodot ⨀ yes element-wise multiplication
Is equivalent to R Focus on updating the uncertainties of , The specific understanding is as follows
R R R map Some location The better the classification ,softmax Pull more open , that prediction confidence The bigger it is ,1-R The smaller it is , It means you don't have to go refine The area
R R R map Some location The worse the classification ,softmax cannot pull open , that prediction confidence The smaller it is ,1-R The bigger it is , It means to focus on refine The area
ps: Follow up select and replace It seems that I can't analyze too many details , It needs to be combined with the code
4.3 MagNetFast
stay MagNet On the basis of
- Reduce scale Number
- Reduce each scale Up refine Of patch Number (only selects the patches with the highest prediction uncertainty Y u Y^u Yu for refinement)
5 Experiments
When training, each scale On randomly extract image patches
When it comes to testing ,extract non-overlapping patches for processing
5.1 Datasets

5.2 Experiments on the Cityscapes dataset
1)Benefits of multiple scale levels

scale Set to 4 The best effect
Notice here ,patch size The smaller it is ,refine The higher the accuracy of
patch size In turn (hxw)
1024x2048->512x1024->256x512->128x256
Network size is patch resize The size is 128x256
amount to refine The accuracy of is
Original picture x(128/1024) -> Original picture x(128/512)-> Original picture x(128/256)-> Original picture x(128/128)
That is to say
256->512->1024->2048
Now feel the effect

The second line should be refine Later results
Zoom in on the first line , The second picture is all red dots

2)Comparing segmentation approaches


There are many analogy poles ( More delicate ), The segmentation is better than before
3)Ablation study: point selection

chart (a) It can be seen that ,MagNet Of IoU Bigger than other methods
4)Ablation study: point selection
Here we explore some Y u Y^u Yu and R u R^u Ru Combination of , 2 16 = 65536 2^{16} = 65536 216=65536
Here we explore each scale need refine Of point Number
5)Ablation study: segmentation backbones
5.3 DeepGlobe

5.4 Gleason

6 Conclusion(own)
accumulated Good idea
stage Too much speed should be much slower
Granularity and resolution
边栏推荐
- 论文写作tip2
- 【Mixup】《Mixup:Beyond Empirical Risk Minimization》
- 【FastDepth】《FastDepth:Fast Monocular Depth Estimation on Embedded Systems》
- Common CNN network innovations
- Solve the problem of latex picture floating
- Label propagation
- Drawing mechanism of view (I)
- [introduction to information retrieval] Chapter II vocabulary dictionary and inverted record table
- MoCO ——Momentum Contrast for Unsupervised Visual Representation Learning
- open3d学习笔记五【RGBD融合】
猜你喜欢
![[introduction to information retrieval] Chapter 6 term weight and vector space model](/img/42/bc54da40a878198118648291e2e762.png)
[introduction to information retrieval] Chapter 6 term weight and vector space model

程序的执行

【双目视觉】双目矫正

超时停靠视频生成

MoCO ——Momentum Contrast for Unsupervised Visual Representation Learning

Label propagation

Execution of procedures

【Mixup】《Mixup:Beyond Empirical Risk Minimization》

ModuleNotFoundError: No module named ‘pytest‘

Common CNN network innovations
随机推荐
机器学习理论学习:感知机
Typeerror in allenlp: object of type tensor is not JSON serializable error
[tricks] whiteningbert: an easy unsupervised sentence embedding approach
Huawei machine test questions
图片数据爬取工具Image-Downloader的安装和使用
mmdetection训练自己的数据集--CVAT标注文件导出coco格式及相关操作
[Sparse to Dense] Sparse to Dense: Depth Prediction from Sparse Depth samples and a Single Image
How do vision transformer work? [interpretation of the paper]
自然辩证辨析题整理
[Bert, gpt+kg research] collection of papers on the integration of Pretrain model with knowledge
Latex formula normal and italic
【DIoU】《Distance-IoU Loss:Faster and Better Learning for Bounding Box Regression》
Thesis tips
Jordan decomposition example of matrix
Implement interface Iterable & lt; T>
yolov3训练自己的数据集(MMDetection)
超时停靠视频生成
【Hide-and-Seek】《Hide-and-Seek: A Data Augmentation Technique for Weakly-Supervised Localization xxx》
Semi supervised mixpatch
Solve the problem of latex picture floating