当前位置:网站首页>[semantic segmentation] setr_ Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformer
[semantic segmentation] setr_ Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformer
2022-07-29 06:04:00 【Dull cat】
List of articles

One 、 The main idea
Among the current methods of semantic segmentation , mainly FCN Based on encoder-decoder Methods , But this kind of method is catching long-range Weak ability in information :
- In order to improve receptive field , There is PSP/ASPP/attention Other methods
- These methods mainly use the feature map of the original image after down sampling , That is to use high-level information to improve the receptive field , Lack of utilization of low-level information .
Methods of this paper :
- A completely used transformer The method of semantic segmentation of
- Input : Divide the original drawing into fixed size patch, To form a sequece of image patch, Then use linear embedding layer To get a sequence of feature embedding vectors As transformer The input of

Two 、 Implementation method
1、 Turn the picture into a serialized patch:
take x ∈ R H × W × 3 x\in R^{H\times W \times 3} x∈RH×W×3 Cut into uniform size H 16 × W 16 \frac{H}{16} \times \frac{W}{16} 16H×16W, And then put these patch Flatten
2、 Linear projection : f : p → e ∈ R C f : p \to e \in R^C f:p→e∈RC
Use linear projection f take patch Map to a C Dimensional embedding space, So from a 2 Dimensional images get a one-dimensional sequence
3、 Location code :
For each patch Spatial information coding , The author gives each position i i i Upper patch Learned a special location code p i p_i pi, Add to e i e_i ei On , To get the final input E = e 1 + p 1 , e 2 + p 2 , . . . , e l + p l E = {e_1+p_1, e_2+p_2,...,e_l+p_l} E=e1+p1,e2+p2,...,el+pl
4、Transformer:
Transformer Using the above E As input , It means that it can get the overall feeling field , solve FCN And other methods to feel the problem of limited fields .

4、Decoder
Decoder The role of : Generate the same size as the original 2 Dimension segmentation results
therefore , Here we need to put encoder Characteristics of Z from H W 256 \frac{HW}{256} 256HW reshape become H 16 × W 16 × C \frac{H}{16} \times \frac{W}{16} \times {C} 16H×16W×C.
Method 1 :Naive upsampling (Naive)
① take transformer The resulting features Z L e Z^{L_e} ZLe Number of mapped to split categories ( Such as cityscape Namely 19)
1x1 conv + sync batch norm (with relu) + 1x1 conv
② Use bilinear interpolation for up sampling , And then calculate loss
Method 2 :Progressive UPsampling (PUP)
Use progressive upsampling , Using convolution kernel sampling alternating transform to achieve , In order to avoid the error caused by multiple direct sampling , This upsampling method only upsampling each time 2 times , That is to say, if you want to set the size to H 16 × W 16 \frac{H}{16} \times \frac{W}{16} 16H×16W Of Z L e Z^{L_e} ZLe Up sampling to the size of the original image , Need to carry out 4 operations .
Method 3 :Multi-Level feature Aggregation(MLA)
Use multi-layer feature aggregation , That is, the characteristics of cross layer distribution { Z m } ( m ∈ { L e m , 2 L e m , . . . , M L e m } ) \{Z^m\} (m \in \{\frac{L_e}{m}, 2\frac{L_e}{m},..., M\frac{L_e}{m}\}) { Zm}(m∈{ mLe,2mLe,...,MmLe}) As input ( The interval step is L e m \frac{L_e}{m} mLe), Input to decoder in .
after , Deployed M A flow (stream), Each focuses on one layer, Within each stream :
- First of all, will encoder Characteristics of Z l Z_l Zl from 2D ( H W 256 × C \frac{HW}{256} \times C 256HW×C) reshape To 3D ( H 16 × W 16 × C \frac{H}{16} \times \frac{W}{16} \times C 16H×16W×C)
- Then use a 3 Layer network (kelnel size 1x1,3x3,3x3) :
- Halve the number of channels on the first and third floors respectively , And after the third floor , Use bilinear interpolation to sample the feature map 4 times .
- Secondly, in order to improve the information interaction between different flows , The author introduces a top-down The aggregation mechanism of point by point addition of , And after the addition operation, a 3x3 Convolution
- After the third floor , Can pass concat Let's take all the stream The characteristics of , Then use bilinear interpolation to sample the feature map 4 Double the size of the original .

5、Auxiliary loss:
Every auxiliary loss Followed by 2 Layer network , The author in the following different transformer layers Add the all auxiliary loss:
- SETR-Naive ( Z 10 , Z 15 , Z 20 Z^{10},Z^{15},Z^{20} Z10,Z15,Z20)
- SETR-PUP ( Z 10 , Z 15 , Z 20 , Z 24 Z^{10},Z^{15},Z^{20},Z^{24} Z10,Z15,Z20,Z24)
- SETR-MLA ( Z 6 , Z 12 , Z 18 , Z 24 Z^{6},Z^{12},Z^{18},Z^{24} Z6,Z12,Z18,Z24)
3、 ... and 、 Realization effect
stay ADE20K and Pascal VOC The effect of :


stay cityscape Comparison of the results on :
Visualization of different layers :

Four 、 Code
Code path :https://github.com/fudan-zvg/SETR
frame :mmsegmentation
# 1、source Environmental Science
source activate mmsegmentation
# 2、 Compile library path
python setup develop.py

# config file
configs/SETR/

Modify data path :
configs/_base_/dataset/cityscapes/py

边栏推荐
- ANR优化:导致 OOM 崩溃及相对应的解决方案
- Activity交互问题,你确定都知道?
- GAN:生成对抗网络 Generative Adversarial Networks
- 主流实时流处理计算框架Flink初体验。
- Ffmpeg creation GIF expression pack tutorial is coming! Say thank you, brother black fly?
- 【ML】机器学习模型之PMML--概述
- Personal learning website
- [ml] PMML of machine learning model -- Overview
- Most PHP programmers don't understand how to deploy safe code
- Show profiles of MySQL is used.
猜你喜欢

ASM piling: after learning ASM tree API, you don't have to be afraid of hook anymore

datax安装

Most PHP programmers don't understand how to deploy safe code

【目标检测】KL-Loss:Bounding Box Regression with Uncertainty for Accurate Object Detection

【ML】机器学习模型之PMML--概述

【DL】关于tensor(张量)的介绍和理解

Technology that deeply understands the principle of MMAP and makes big manufacturers love it

第2周学习:卷积神经网络基础

【数据库】数据库课程设计一一疫苗接种数据库

day02作业之进程管理
随机推荐
Flink, the mainstream real-time stream processing computing framework, is the first experience.
Interesting talk about performance optimization thread pool: is the more threads open, the better?
rsync+inotyfy实现数据单项监控实时同步
深入理解MMAP原理,让大厂都爱不释手的技术
Markdown语法
30 knowledge points that must be mastered in quantitative development [what is level-2 data]
Ffmpeg creation GIF expression pack tutorial is coming! Say thank you, brother black fly?
DCAT batch operation popup and parameter transfer
Synchronous development with open source projects & codereview & pull request & Fork how to pull the original warehouse
【Attention】Visual Attention Network
【目标检测】Generalized Focal Loss V1
SQL repair duplicate data
Is flutter being quietly abandoned? On the future of flutter
主流实时流处理计算框架Flink初体验。
asyncawait和promise的区别
手撕ORM 框架(泛型+注解+反射)
Thinkphp6 output QR code image format to solve the conflict with debug
Personal learning website
【Transformer】SegFormer:Simple and Efficient Design for Semantic Segmentation with Transformers
Windos下安装pyspider报错:Please specify --curl-dir=/path/to/built/libcurl解决办法