当前位置:网站首页>【ARXIV2203】SepViT: Separable Vision Transformer
【ARXIV2203】SepViT: Separable Vision Transformer
2022-07-28 05:00:00 【AI frontier theory group @ouc】

1、Motivation
The author points out that current vision Transformer The pain point in the model is :huge resource demands. To solve this problem , The author puts forward Separable Vision Transformer (SepViT), The overall structure is shown in the figure below .

Including the following contributions :
- Depthwise separable self-attention. It can achieve local information communication within the windows and global informaiton exchange among the windows in a single Transformer block.
- Window token embedding. Helps to model the attention relationship among windows with negligible computational cost.
2、Depthwise separable self-attention
and MobileNet Proposed Deep separable convolution is very similar , Include Depthwise Self-Attention (DWA) and Pointwise Self-Attention (PWA) Two steps . One is layer by layer calculation attention, One is point by point calculation attention.
DWA As shown in the figure below , It can be seen that attention It is calculated in each layer , It's simple . however , If calculated per pixel , The computational complexity will be too high . therefore , The author used window token embedding. As shown in the picture , The input characteristics are 6x6xC, Split into 2x2=4 individual window, First, build. windows token The size is 4xCx1. four windows The size is 4xCx9. Splice the two features into 4xCx10, And then in four window Calculate attention separately in , The final result size is 4xCx10 ( Includes new winodw The characteristics and window token).

PWA The calculation of is also very interesting , Put the new window token Take it out for similarity calculation , obtain 4x4 The weight matrix of , Using this weight matrix, four window Weighted by the characteristics of , Finally, the output characteristics .

3、Grouped Self-Attention
The author uses group convolution to separate the depth Self-Attention It has been extended , A grouping method is proposed Self-Attention. As shown in the figure below , Put the adjacent sub Window Splicing , Form bigger Window, It's similar to going to Window Divide into groups , In a group Window In depth Self-Attention signal communication . In this way ,Grouped Self-Attention Can capture multiple Window Long term visual dependence . In terms of calculating cost and performance gain ,Grouped Self-Attention Specific depth separable Self-Attention With a certain additional cost , But it also has better performance .

The experimental part can refer to the author's paper , There's not much more here .
边栏推荐
- How to send and receive reports through outlook in FastReport VCL?
- go-zero单体服务使用泛型简化注册Handler路由
- What should testers know about login security?
- 驾驭EVM和XCM的强大功能,SubWallet如何赋能波卡和Moonbeam
- Visual studio 2019 new OpenGL project does not need to reconfigure the environment
- Program life | how to switch to software testing? (software testing learning roadmap attached)
- 动态sql和分页
- jsonp 单点登录 权限检验
- Testcafe's positioning, operation of page elements, and verification of execution results
- Is low code the future of development? On low code platform
猜你喜欢

Configuration experiment of building virtual private network based on MPLS

Flink mind map

数据安全逐步落地,必须紧盯泄露源头

FreeRTOS learning (I)

excel实战应用案例100讲(十一)-Excel插入图片小技巧

MySQL(5)

Histogram of pyplot module of Matplotlib (hist(): basic parameter, return value)

Introduction to testcafe

go-zero单体服务使用泛型简化注册Handler路由

HashSet add
随机推荐
如何在 FastReport VCL 中通过 Outlook 发送和接收报告?
在外包公司两年了,感觉快要废了
提升学生群体中的STEAM教育核心素养
Real intelligence has been certified by two of the world's top market research institutions and has entered the global camp of excellence
Paper reading notes -- crop yield prediction using deep neural networks
Introduction to testcafe
Cloudcompare & PCL point cloud least square fitting plane
RT_ Use of thread message queue
Program life | how to switch to software testing? (software testing learning roadmap attached)
Domain name (subdomain name) collection method of Web penetration
吉利AI面试题【杭州多测师】【杭州多测师_王sir】
Improve the core quality of steam education among students
Visual studio 2019 new OpenGL project does not need to reconfigure the environment
Handling of web page image loading errors
Leetcode 454. Adding four numbers II
Basic knowledge of network security - password (I)
After a year of unemployment, I learned to do cross-border e-commerce and earned 520000. Only then did I know that going to work really delayed making money!
Research on the design of robot education in stem course
What should testers know about login security?
HDU 2586 How far away ? (LCA multiplication method)