当前位置：网站首页>【MagNet】《Progressive Semantic Segmentation》

【MagNet】《Progressive Semantic Segmentation》

2022-07-02 07:48:00 【bryant_ meng】

Insert picture description here

CVPR-2021

List of articles

1 Background and Motivation
2 Related Work
3 Advantages / Contributions
4 Method
5 Experiments
6 Conclusion（own）

1 Background and Motivation

When doing high-resolution image segmentation tasks , because GPU Resource constraints , You can't train the original picture directly

The solution is often downsample the big image or divide the image into local patches for separate processing

However downsample Many details will be lost ,patches The method lacks a holistic view （ Global information ）

Insert picture description here

The author combines the advantages of the above two methods , Put forward a multi-scale segmentation framework for high-resolution images——MagNet

stay Cityscapes / DeepGlobe / Gleason Its effectiveness is verified on three high-resolution image datasets

2 Related Work

Multi-scale, eg：FPN / ASPP / HRNet
multi-stage, eg：Auto-ZoomNet
context aggregation,eg：BiseNet
Segmentation refinement

Insert picture description here

3 Advantages / Contributions

Aiming at the problem of high-resolution image segmentation , Design MagNet The Internet ,Experiments on three high-resolution datasets of urban views, aerial scenes, and medical images show that MagNet consistently outperforms the state-of-theart methods by a significant margin

4 Method

There are two core modules

segmentation network（module, Ordinary partition network ）
refinement module（ Proposed by the author ）

4.1 Multistage processing pipeline

Insert picture description here

s Express scale
p Express patch
X Represents the input picture
Y Indicates the output picture
$\bar{X}$ Indicates input to segmentation network Medium tensor, Fixed size
$\bar{Y}$ From refinement module The output of the tensor, Fixed size
$\bar{O}$ From segmentation network The output of the tensor, Fixed size

With 4 scale As an example

If you input a picture h and w by 1024x2048

each scale Under the patch The size is ：

1024x2048
512x1024
256x512
128x256

segmentation and refinement The input and output of the module are 128x256

4.2 Refinement module

Insert picture description here

1）refinement module There are two inputs to

the cumulative result from the previous stages, $\bar{Y}$
the result obtained by running the segmentation module at and only at the current scale, $\bar{O}$

2）refinement network The structure is as follows

Insert picture description here
O+Y=R

3） history scale Results and current scale result set

Insert picture description here

Let $Y_u$ and $R_u$ denote the prediction uncertainty maps for $Y$ and $R$ respectively.

4）uncertainty maps For the definition of

for each pixel of Y , the prediction confidence at this location is defined as the absolute difference between the highest probability value and the second-highest value (among the C probability values for C classes).

5） Use two prediction uncertainty maps Choose $Y$ ( Cumulative segmentation graph ) Of $k$ Refine at four locations .

Insert picture description here

$k$ It means $Y$ The inaccuracy of prediction , and $R$ Where the prediction is more accurate
$\bigodot$ yes element-wise multiplication
$F$ Means median filtering , To smooth out the score map
$1 - R$ Equivalent to attention mechanism , Used to correct $Y$ Weighted

5） $Y_u$ and $R_u$ The combination mode of is
Insert picture description here
among F denotes median blurring to smooth the score map（ median filtering ）

$\bigodot$ yes element-wise multiplication

Is equivalent to R Focus on updating the uncertainties of , The specific understanding is as follows

$R$ map Some location The better the classification ,softmax Pull more open , that prediction confidence The bigger it is ,1-R The smaller it is , It means you don't have to go refine The area
$R$ map Some location The worse the classification ,softmax cannot pull open , that prediction confidence The smaller it is ,1-R The bigger it is , It means to focus on refine The area

ps： Follow up select and replace It seems that I can't analyze too many details , It needs to be combined with the code

4.3 MagNetFast

stay MagNet On the basis of

Reduce scale Number
Reduce each scale Up refine Of patch Number （only selects the patches with the highest prediction uncertainty $Y^u$ for refinement）

5 Experiments

When training, each scale On randomly extract image patches

When it comes to testing ,extract non-overlapping patches for processing

5.1 Datasets

Insert picture description here

5.2 Experiments on the Cityscapes dataset

1）Benefits of multiple scale levels

Insert picture description here
scale Set to 4 The best effect

Notice here ,patch size The smaller it is ,refine The higher the accuracy of

patch size In turn （hxw）

1024x2048->512x1024->256x512->128x256

Network size is patch resize The size is 128x256

amount to refine The accuracy of is

Original picture x(128/1024) -> Original picture x(128/512)-> Original picture x(128/256)-> Original picture x(128/128)

That is to say

256->512->1024->2048

Now feel the effect

Insert picture description here
The second line should be refine Later results

Zoom in on the first line , The second picture is all red dots

Insert picture description here

2）Comparing segmentation approaches

Insert picture description here

There are many analogy poles （ More delicate ）, The segmentation is better than before

3）Ablation study: point selection

Insert picture description here
chart (a) It can be seen that ,MagNet Of IoU Bigger than other methods

4）Ablation study: point selection
Insert picture description here
Here we explore some $Y^u$ and $R^u$ Combination of , $2^{16} = 65536$

Here we explore each scale need refine Of point Number