当前位置：网站首页>【MagNet】《Progressive Semantic Segmentation》

【MagNet】《Progressive Semantic Segmentation》

2022-07-02 06:26:00 【bryant_meng】

在这里插入图片描述

CVPR-2021

文章目录

1 Background and Motivation
2 Related Work
3 Advantages / Contributions
4 Method
5 Experiments
6 Conclusion（own）

1 Background and Motivation

做高分辨率图像分割任务的时候，由于 GPU 资源的限制，不能直接训练原图

解决办法往往是 downsample the big image or divide the image into local patches for separate processing

然而 downsample 会丢失很多细节，patches 方法缺乏大局观（全局信息）

在这里插入图片描述

作者结合上述两种方法的优点，提出了 a multi-scale segmentation framework for high-resolution images——MagNet

在 Cityscapes / DeepGlobe / Gleason 三个高分辨率图片数据集上验证了其有效性

2 Related Work

Multi-scale, eg：FPN / ASPP / HRNet
multi-stage, eg：Auto-ZoomNet
context aggregation，eg：BiseNet
Segmentation refinement

在这里插入图片描述

3 Advantages / Contributions

针对高分辨率图像分割问题，设计 MagNet 网络，Experiments on three high-resolution datasets of urban views, aerial scenes, and medical images show that MagNet consistently outperforms the state-of-theart methods by a significant margin

4 Method

核心模块有两个

segmentation network（module，普通的分割网络）
refinement module（作者提出的）

4.1 Multistage processing pipeline

在这里插入图片描述

s 表示 scale
p 表示 patch
X 表示输入图片
Y 表示输出图片
$\bar{X}$ 表示输入到 segmentation network 中的 tensor，尺寸固定
$\bar{Y}$ 表示从 refinement module 中输出的 tensor，尺寸固定
$\bar{O}$ 表示从 segmentation network 中输出的 tensor，尺寸固定

以 4 scale 为例子

假如输入图片 h 和 w 为 1024x2048

各个 scale 下的 patch 的大小为：

1024x2048
512x1024
256x512
128x256

segmentation 和 refinement 模块的输入输出都为 128x256

4.2 Refinement module

在这里插入图片描述

1）refinement module 的输入有两个

the cumulative result from the previous stages, $\bar{Y}$
the result obtained by running the segmentation module at and only at the current scale, $\bar{O}$

2）refinement network 的结构如下

在这里插入图片描述
O+Y=R

3）历史 scale 结果和当前 scale 结果集合

在这里插入图片描述

Let $Y_u$ and $R_u$ denote the prediction uncertainty maps for $Y$ and $R$ respectively.

4）uncertainty maps 的定义为

for each pixel of Y , the prediction confidence at this location is defined as the absolute difference between the highest probability value and the second-highest value (among the C probability values for C classes).

5）使用两个 prediction uncertainty maps来选择 $Y$ (累积分割图) 的 $k$ 个位置进行细化。

在这里插入图片描述

$k$ 表示的是 $Y$ 预测的不准确的地方，而 $R$ 预测的比较准确的地方
$\bigodot$ 是 element-wise multiplication
$F$ 表示中值滤波，用来平滑the score map
$1 - R$ 相当于注意力机制，用来对 $Y$ 进行加权

5） $Y_u$ and $R_u$ 的组合方式为
在这里插入图片描述
其中 F denotes median blurring to smooth the score map（中值滤波）

$\bigodot$ 是 element-wise multiplication

相当于把 R 的不确定的地方着重更新一下，具体理解方式如下

$R$ map 某个 location 分类的越好，softmax 拉的越开，那么 prediction confidence 越大，1-R 越小，就表示不用去 refine 该区域
$R$ map 某个 location 分类的越差，softmax 拉不开，那么 prediction confidence 越小，1-R 越大，就表示要着重去 refine 该区域

ps：后续的 select 和 replace 好像分析不出来太多细节，需要再结合代码看看