当前位置:网站首页>[semantic segmentation] setr_ Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformer
[semantic segmentation] setr_ Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformer
2022-07-29 06:04:00 【Dull cat】
List of articles

One 、 The main idea
Among the current methods of semantic segmentation , mainly FCN Based on encoder-decoder Methods , But this kind of method is catching long-range Weak ability in information :
- In order to improve receptive field , There is PSP/ASPP/attention Other methods
- These methods mainly use the feature map of the original image after down sampling , That is to use high-level information to improve the receptive field , Lack of utilization of low-level information .
Methods of this paper :
- A completely used transformer The method of semantic segmentation of
- Input : Divide the original drawing into fixed size patch, To form a sequece of image patch, Then use linear embedding layer To get a sequence of feature embedding vectors As transformer The input of

Two 、 Implementation method
1、 Turn the picture into a serialized patch:
take x ∈ R H × W × 3 x\in R^{H\times W \times 3} x∈RH×W×3 Cut into uniform size H 16 × W 16 \frac{H}{16} \times \frac{W}{16} 16H×16W, And then put these patch Flatten
2、 Linear projection : f : p → e ∈ R C f : p \to e \in R^C f:p→e∈RC
Use linear projection f take patch Map to a C Dimensional embedding space, So from a 2 Dimensional images get a one-dimensional sequence
3、 Location code :
For each patch Spatial information coding , The author gives each position i i i Upper patch Learned a special location code p i p_i pi, Add to e i e_i ei On , To get the final input E = e 1 + p 1 , e 2 + p 2 , . . . , e l + p l E = {e_1+p_1, e_2+p_2,...,e_l+p_l} E=e1+p1,e2+p2,...,el+pl
4、Transformer:
Transformer Using the above E As input , It means that it can get the overall feeling field , solve FCN And other methods to feel the problem of limited fields .

4、Decoder
Decoder The role of : Generate the same size as the original 2 Dimension segmentation results
therefore , Here we need to put encoder Characteristics of Z from H W 256 \frac{HW}{256} 256HW reshape become H 16 × W 16 × C \frac{H}{16} \times \frac{W}{16} \times {C} 16H×16W×C.
Method 1 :Naive upsampling (Naive)
① take transformer The resulting features Z L e Z^{L_e} ZLe Number of mapped to split categories ( Such as cityscape Namely 19)
1x1 conv + sync batch norm (with relu) + 1x1 conv
② Use bilinear interpolation for up sampling , And then calculate loss
Method 2 :Progressive UPsampling (PUP)
Use progressive upsampling , Using convolution kernel sampling alternating transform to achieve , In order to avoid the error caused by multiple direct sampling , This upsampling method only upsampling each time 2 times , That is to say, if you want to set the size to H 16 × W 16 \frac{H}{16} \times \frac{W}{16} 16H×16W Of Z L e Z^{L_e} ZLe Up sampling to the size of the original image , Need to carry out 4 operations .
Method 3 :Multi-Level feature Aggregation(MLA)
Use multi-layer feature aggregation , That is, the characteristics of cross layer distribution { Z m } ( m ∈ { L e m , 2 L e m , . . . , M L e m } ) \{Z^m\} (m \in \{\frac{L_e}{m}, 2\frac{L_e}{m},..., M\frac{L_e}{m}\}) { Zm}(m∈{ mLe,2mLe,...,MmLe}) As input ( The interval step is L e m \frac{L_e}{m} mLe), Input to decoder in .
after , Deployed M A flow (stream), Each focuses on one layer, Within each stream :
- First of all, will encoder Characteristics of Z l Z_l Zl from 2D ( H W 256 × C \frac{HW}{256} \times C 256HW×C) reshape To 3D ( H 16 × W 16 × C \frac{H}{16} \times \frac{W}{16} \times C 16H×16W×C)
- Then use a 3 Layer network (kelnel size 1x1,3x3,3x3) :
- Halve the number of channels on the first and third floors respectively , And after the third floor , Use bilinear interpolation to sample the feature map 4 times .
- Secondly, in order to improve the information interaction between different flows , The author introduces a top-down The aggregation mechanism of point by point addition of , And after the addition operation, a 3x3 Convolution
- After the third floor , Can pass concat Let's take all the stream The characteristics of , Then use bilinear interpolation to sample the feature map 4 Double the size of the original .

5、Auxiliary loss:
Every auxiliary loss Followed by 2 Layer network , The author in the following different transformer layers Add the all auxiliary loss:
- SETR-Naive ( Z 10 , Z 15 , Z 20 Z^{10},Z^{15},Z^{20} Z10,Z15,Z20)
- SETR-PUP ( Z 10 , Z 15 , Z 20 , Z 24 Z^{10},Z^{15},Z^{20},Z^{24} Z10,Z15,Z20,Z24)
- SETR-MLA ( Z 6 , Z 12 , Z 18 , Z 24 Z^{6},Z^{12},Z^{18},Z^{24} Z6,Z12,Z18,Z24)
3、 ... and 、 Realization effect
stay ADE20K and Pascal VOC The effect of :


stay cityscape Comparison of the results on :
Visualization of different layers :

Four 、 Code
Code path :https://github.com/fudan-zvg/SETR
frame :mmsegmentation
# 1、source Environmental Science
source activate mmsegmentation
# 2、 Compile library path
python setup develop.py

# config file
configs/SETR/

Modify data path :
configs/_base_/dataset/cityscapes/py

边栏推荐
- Rsync+inotyfy realize real-time synchronization of single data monitoring
- Reporting service 2016 custom authentication
- 【Transformer】ACMix:On the Integration of Self-Attention and Convolution
- 第一周任务 深度学习和pytorch基础
- [go] use of defer
- [semantic segmentation] full attention network for semantic segmentation
- Spring, summer, autumn and winter with Miss Zhang (5)
- Basic use of array -- traverse the circular array to find the maximum value, minimum value, maximum subscript and minimum subscript of the array
- Ribbon学习笔记一
- day02作业之进程管理
猜你喜欢

【Attention】Visual Attention Network

Training log 6 of the project "construction of Shandong University mobile Internet development technology teaching website"

Markdown syntax

【语义分割】语义分割综述

Centos7 silently installs Oracle

主流实时流处理计算框架Flink初体验。

【语义分割】Mapillary 数据集简介

【DL】关于tensor(张量)的介绍和理解

第三周周报 ResNet+ResNext

研究生新生培训第一周:深度学习和pytorch基础
随机推荐
[database] database course design - vaccination database
Windos下安装pyspider报错:Please specify --curl-dir=/path/to/built/libcurl解决办法
Training log 7 of the project "construction of Shandong University mobile Internet development technology teaching website"
Yum local source production
GAN:生成对抗网络 Generative Adversarial Networks
Flink connector Oracle CDC synchronizes data to MySQL in real time (oracle19c)
[semantic segmentation] Introduction to mapillary dataset
day02 作业之文件权限
[go] use of defer
Briefly talk about the difference between pendingintent and intent
Realize the scheduled backup of MySQL database in Linux environment through simple script (mysqldump command backup)
SSM integration
研究生新生培训第三周:ResNet+ResNeXt
【Transformer】SOFT: Softmax-free Transformer with Linear Complexity
Valuable blog and personal experience collection (continuous update)
【Clustrmaps】访客统计
【Transformer】AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
C # judge whether the user accesses by mobile phone or computer
【DL】搭建卷积神经网络用于回归预测(数据+代码详细教程)
These process knowledge you must know