当前位置:网站首页>[semantic segmentation] setr_ Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformer
[semantic segmentation] setr_ Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformer
2022-07-29 06:04:00 【Dull cat】
List of articles

One 、 The main idea
Among the current methods of semantic segmentation , mainly FCN Based on encoder-decoder Methods , But this kind of method is catching long-range Weak ability in information :
- In order to improve receptive field , There is PSP/ASPP/attention Other methods
- These methods mainly use the feature map of the original image after down sampling , That is to use high-level information to improve the receptive field , Lack of utilization of low-level information .
Methods of this paper :
- A completely used transformer The method of semantic segmentation of
- Input : Divide the original drawing into fixed size patch, To form a sequece of image patch, Then use linear embedding layer To get a sequence of feature embedding vectors As transformer The input of
Two 、 Implementation method
1、 Turn the picture into a serialized patch:
take x ∈ R H × W × 3 x\in R^{H\times W \times 3} x∈RH×W×3 Cut into uniform size H 16 × W 16 \frac{H}{16} \times \frac{W}{16} 16H×16W, And then put these patch Flatten
2、 Linear projection : f : p → e ∈ R C f : p \to e \in R^C f:p→e∈RC
Use linear projection f take patch Map to a C Dimensional embedding space, So from a 2 Dimensional images get a one-dimensional sequence
3、 Location code :
For each patch Spatial information coding , The author gives each position i i i Upper patch Learned a special location code p i p_i pi, Add to e i e_i ei On , To get the final input E = e 1 + p 1 , e 2 + p 2 , . . . , e l + p l E = {e_1+p_1, e_2+p_2,...,e_l+p_l} E=e1+p1,e2+p2,...,el+pl
4、Transformer:
Transformer Using the above E As input , It means that it can get the overall feeling field , solve FCN And other methods to feel the problem of limited fields .
4、Decoder
Decoder The role of : Generate the same size as the original 2 Dimension segmentation results
therefore , Here we need to put encoder Characteristics of Z from H W 256 \frac{HW}{256} 256HW reshape become H 16 × W 16 × C \frac{H}{16} \times \frac{W}{16} \times {C} 16H×16W×C.
Method 1 :Naive upsampling (Naive)
① take transformer The resulting features Z L e Z^{L_e} ZLe Number of mapped to split categories ( Such as cityscape Namely 19)
1x1 conv + sync batch norm (with relu) + 1x1 conv
② Use bilinear interpolation for up sampling , And then calculate loss
Method 2 :Progressive UPsampling (PUP)
Use progressive upsampling , Using convolution kernel sampling alternating transform to achieve , In order to avoid the error caused by multiple direct sampling , This upsampling method only upsampling each time 2 times , That is to say, if you want to set the size to H 16 × W 16 \frac{H}{16} \times \frac{W}{16} 16H×16W Of Z L e Z^{L_e} ZLe Up sampling to the size of the original image , Need to carry out 4 operations .
Method 3 :Multi-Level feature Aggregation(MLA)
Use multi-layer feature aggregation , That is, the characteristics of cross layer distribution { Z m } ( m ∈ { L e m , 2 L e m , . . . , M L e m } ) \{Z^m\} (m \in \{\frac{L_e}{m}, 2\frac{L_e}{m},..., M\frac{L_e}{m}\}) { Zm}(m∈{ mLe,2mLe,...,MmLe}) As input ( The interval step is L e m \frac{L_e}{m} mLe), Input to decoder in .
after , Deployed M A flow (stream), Each focuses on one layer, Within each stream :
- First of all, will encoder Characteristics of Z l Z_l Zl from 2D ( H W 256 × C \frac{HW}{256} \times C 256HW×C) reshape To 3D ( H 16 × W 16 × C \frac{H}{16} \times \frac{W}{16} \times C 16H×16W×C)
- Then use a 3 Layer network (kelnel size 1x1,3x3,3x3) :
- Halve the number of channels on the first and third floors respectively , And after the third floor , Use bilinear interpolation to sample the feature map 4 times .
- Secondly, in order to improve the information interaction between different flows , The author introduces a top-down The aggregation mechanism of point by point addition of , And after the addition operation, a 3x3 Convolution
- After the third floor , Can pass concat Let's take all the stream The characteristics of , Then use bilinear interpolation to sample the feature map 4 Double the size of the original .
5、Auxiliary loss:
Every auxiliary loss Followed by 2 Layer network , The author in the following different transformer layers Add the all auxiliary loss:
- SETR-Naive ( Z 10 , Z 15 , Z 20 Z^{10},Z^{15},Z^{20} Z10,Z15,Z20)
- SETR-PUP ( Z 10 , Z 15 , Z 20 , Z 24 Z^{10},Z^{15},Z^{20},Z^{24} Z10,Z15,Z20,Z24)
- SETR-MLA ( Z 6 , Z 12 , Z 18 , Z 24 Z^{6},Z^{12},Z^{18},Z^{24} Z6,Z12,Z18,Z24)
3、 ... and 、 Realization effect
stay ADE20K and Pascal VOC The effect of :
stay cityscape Comparison of the results on :
Visualization of different layers :
Four 、 Code
Code path :https://github.com/fudan-zvg/SETR
frame :mmsegmentation
# 1、source Environmental Science
source activate mmsegmentation
# 2、 Compile library path
python setup develop.py
# config file
configs/SETR/
Modify data path :
configs/_base_/dataset/cityscapes/py
边栏推荐
- Markdown syntax
- Show profiles of MySQL is used.
- 并发编程学习笔记 之 工具类CountDownLatch、CyclicBarrier详解
- 通过简单的脚本在Linux环境实现Mysql数据库的定时备份(Mysqldump命令备份)
- 【DL】关于tensor(张量)的介绍和理解
- 并发编程学习笔记 之 ReentrantLock实现原理的探究
- 【语义分割】Fully Attentional Network for Semantic Segmentation
- My ideal job, the absolute freedom of coder farmers is the most important - the pursuit of entrepreneurship in the future
- Technology that deeply understands the principle of MMAP and makes big manufacturers love it
- Personal learning website
猜你喜欢
Anr Optimization: cause oom crash and corresponding solutions
研究生新生培训第一周:深度学习和pytorch基础
【TensorRT】将 PyTorch 转化为可部署的 TensorRT
并发编程学习笔记 之 工具类Semaphore(信号量)
clion+opencv+aruco+cmake配置
Machine learning makes character recognition easier: kotlin+mvvm+ Huawei ml Kit
ANR优化:导致 OOM 崩溃及相对应的解决方案
[overview] image classification network
[semantic segmentation] full attention network for semantic segmentation
【Transformer】SegFormer:Simple and Efficient Design for Semantic Segmentation with Transformers
随机推荐
Ribbon learning notes 1
Flutter 绘制技巧探索:一起画箭头(技巧拓展)
【Transformer】SegFormer:Simple and Efficient Design for Semantic Segmentation with Transformers
Ribbon learning notes II
Windos下安装pyspider报错:Please specify --curl-dir=/path/to/built/libcurl解决办法
【Transformer】TransMix: Attend to Mix for Vision Transformers
Flutter正在被悄悄放弃?浅析Flutter的未来
Study and research the way of programming
研究生新生培训第三周:ResNet+ResNeXt
【综述】图像分类网络
【Transformer】AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
How to make interesting apps for deep learning with zero code (suitable for novices)
centos7 静默安装oracle
Show profiles of MySQL is used.
Tear the ORM framework by hand (generic + annotation + reflection)
Android Studio 实现登录注册-源代码 (连接MySql数据库)
【目标检测】6、SSD
Technology that deeply understands the principle of MMAP and makes big manufacturers love it
[pycharm] pycharm remote connection server
【Transformer】SOFT: Softmax-free Transformer with Linear Complexity