当前位置:网站首页>【ARXIV2203】SepViT: Separable Vision Transformer
【ARXIV2203】SepViT: Separable Vision Transformer
2022-07-28 05:00:00 【AI frontier theory group @ouc】

1、Motivation
The author points out that current vision Transformer The pain point in the model is :huge resource demands. To solve this problem , The author puts forward Separable Vision Transformer (SepViT), The overall structure is shown in the figure below .

Including the following contributions :
- Depthwise separable self-attention. It can achieve local information communication within the windows and global informaiton exchange among the windows in a single Transformer block.
- Window token embedding. Helps to model the attention relationship among windows with negligible computational cost.
2、Depthwise separable self-attention
and MobileNet Proposed Deep separable convolution is very similar , Include Depthwise Self-Attention (DWA) and Pointwise Self-Attention (PWA) Two steps . One is layer by layer calculation attention, One is point by point calculation attention.
DWA As shown in the figure below , It can be seen that attention It is calculated in each layer , It's simple . however , If calculated per pixel , The computational complexity will be too high . therefore , The author used window token embedding. As shown in the picture , The input characteristics are 6x6xC, Split into 2x2=4 individual window, First, build. windows token The size is 4xCx1. four windows The size is 4xCx9. Splice the two features into 4xCx10, And then in four window Calculate attention separately in , The final result size is 4xCx10 ( Includes new winodw The characteristics and window token).

PWA The calculation of is also very interesting , Put the new window token Take it out for similarity calculation , obtain 4x4 The weight matrix of , Using this weight matrix, four window Weighted by the characteristics of , Finally, the output characteristics .

3、Grouped Self-Attention
The author uses group convolution to separate the depth Self-Attention It has been extended , A grouping method is proposed Self-Attention. As shown in the figure below , Put the adjacent sub Window Splicing , Form bigger Window, It's similar to going to Window Divide into groups , In a group Window In depth Self-Attention signal communication . In this way ,Grouped Self-Attention Can capture multiple Window Long term visual dependence . In terms of calculating cost and performance gain ,Grouped Self-Attention Specific depth separable Self-Attention With a certain additional cost , But it also has better performance .

The experimental part can refer to the author's paper , There's not much more here .
边栏推荐
- Do you know several assertion methods commonly used by JMeter?
- Analysis of the reason why easycvr service can't be started and tips for dealing with easy disk space filling
- C语言ATM自动取款机系统项目的设计与开发
- Keil Chinese garbled code solution
- Domain name (subdomain name) collection method of Web penetration
- (2.4) [service Trojan -slimftp] introduction and use
- Observable time series data downsampling practice in Prometheus
- Summary and review of puppeter
- Visual studio 2019 new OpenGL project does not need to reconfigure the environment
- When initializing with pyqt5, super() and_ init _ () problems faced by the coordinated use of functions, as well as the corresponding learning and solutions
猜你喜欢

Transformer -- Analysis and application of attention model

MySQL(5)

Testcafe's positioning, operation of page elements, and verification of execution results

字符串0123456789abcdef,子串(非空且非同串本身)的个数是多少【杭州多测师】【杭州多测师_王sir】...

05.01 string

Redis类型

Introduction to testcafe

Leetcode 454. Adding four numbers II

Use animatedbuilder to separate components and animation, and realize dynamic reuse

Mysql database -- first knowledge database
随机推荐
POJ 3728 the merchant (online query + double LCA)
Barbie q! How to analyze the new game app?
机器人教育在STEM课程中的设计研究
The go zero singleton service uses generics to simplify the registration of handler routes
Simulink automatically generates STM32 code details
After a year of unemployment, I learned to do cross-border e-commerce and earned 520000. Only then did I know that going to work really delayed making money!
欧拉路/欧拉回路
Configuration experiment of building virtual private network based on MPLS
Handling of web page image loading errors
How to quickly turn function test to automatic test
吉利AI面试题【杭州多测师】【杭州多测师_王sir】
C语言ATM自动取款机系统项目的设计与开发
What is the reason why the easycvr national standard protocol access equipment is online but the channel is not online?
Chuangyuan will join hands with 50+ cloud native enterprises to explore new models to cross the digital divide
Redux basic syntax
Redis配置文件详解/参数详解及淘汰策略
Basic knowledge of network security - password (I)
Use animatedbuilder to separate components and animation, and realize dynamic reuse
在外包公司两年了,感觉快要废了
list indices must be integers or slices, not tuple