当前位置:网站首页>【ARIXV2204】Neighborhood attention transformer
【ARIXV2204】Neighborhood attention transformer
2022-07-28 05:01:00 【AI frontier theory group @ouc】

thank B standing “ The alchemy workshop of saury ” Explanation , The analysis here combines many explanations .
The paper :https://arxiv.org/abs/2204.07143
Code :https://github.com/SHI-Labs/Neighborhood-Attention-Transformer
This paper is very simple , In fact, ideas have also appeared in previous papers . First look at the picture below , standard VIT Of attention Calculation is global , Like the red one in the first picture token And the blue ones token Will be global and all token Calculate .swin Are the two figures in the middle , First step token Feature interaction is limited to local windows . Step 2 the window has shift, but token The feature interaction of is still in the local window . The last figure is proposed in this paper neighborhood attention transformer, NAT, all attention Is calculated in 7X7 In the neighborhood of . Looks like convolution equally , Just in one kernel Operate within the scope . But and convolution The difference is ,NAT It's calculation attention, So every one value The weight is determined according to the input value , Instead of a fixed value after training as in convolution kernel .

The author also gives attention Figure of calculation . As shown in the figure below , about CHW Input matrix of ,Query It's a certain place 1XC Vector , key It's a 3x3xC Matrix , The two matrices are multiplied element by element ( Different sizes broadcast ), The result is 3x3xC, Last in C Sum this dimension , obtain 3X3 The similarity matrix of . Use this matrix to value Assign weights , Finally merged into one 1x1xC Vector , Namely attention Calculated results of .

The author also analyzes the computational complexity , It can be seen that , Because calculating attention in local neighborhood , The calculation complexity is greatly reduced , and swin It's basically the same .

The overall architecture of the network is the same as the current method , All are 4 Stage . The resolution of each stage is reduced by half . however , The resolution reduction uses In steps of 2 Of 3X3 Convolution . First step overlapping tokenizer It uses 2 individual 3x3 Convolution , The step size of each convolution is 2.

The author designed 4 It's a network structure , The neighborhood size is 7X7, As follows :

On the task of image classification ,NAT Very good performance , As shown in the following table :

stay Ablation study Inside , The author contrasts postion embedding and attention The performance difference of calculation . however , The author's model is 81.4% , And the table above 83.2 There are differences , I don't know why .

in general , The idea of this paper is very simple , Many previous papers have also reflected this idea . But this paper is jointly done with enterprises , The difficulty should be CUDA Hardware implementation , The author wrote a lot of CUDA Code to right neighborhood The operation is accelerated .
边栏推荐
- How to simulate common web application operations when using testcafe
- Program life | how to switch to software testing? (software testing learning roadmap attached)
- jsonp 单点登录 权限检验
- HDU 1522 marriage is stable
- Rendering process, how the code becomes a page (2)
- Automated test tool playwright (quick start)
- [Oracle] 083 wrong question set
- HDU 1435 stable match
- Comprehensively analyze the differences between steam and maker Education
- What is the core value of testing?
猜你喜欢
![Geely AI interview question [Hangzhou multi tester] [Hangzhou multi tester _ Wang Sir]](/img/18/27a86595eb3a7d30df359d6b2b8d8c.png)
Geely AI interview question [Hangzhou multi tester] [Hangzhou multi tester _ Wang Sir]
![[idea] check out master invalid path problem](/img/83/d36362ba314177cd6f1f74f3e922cd.png)
[idea] check out master invalid path problem

全方位分析STEAM和创客教育的差异化

excel实战应用案例100讲(十一)-Excel插入图片小技巧

Testcafe provides automatic waiting mechanism and live operation mode

Activation functions sigmoid, tanh, relu in convolutional neural networks

吉利AI面试题【杭州多测师】【杭州多测师_王sir】

Easycvr Video Square snapshot adding device channel offline reason display

05.01 string

What is the core value of testing?
随机推荐
Observable time series data downsampling practice in Prometheus
After easycvr is connected to the national standard equipment, how to solve the problem that the equipment video cannot be played completely?
POJ 3417 network (lca+ differential on tree)
吉利AI面试题【杭州多测师】【杭州多测师_王sir】
Easycvr Video Square snapshot adding device channel offline reason display
FPGA:使用PWM波控制LED亮度
HDU 1530 maximum clique
[function document] torch Histc and paddle Histogram and numpy.histogram
驾驭EVM和XCM的强大功能,SubWallet如何赋能波卡和Moonbeam
RT based_ Distributed wireless temperature monitoring system based on thread
How to quickly locate bugs? How to write test cases?
【内功心法】——函数栈帧的创建和销毁(C实现)
go-zero单体服务使用泛型简化注册Handler路由
Flink mind map
Specific differences between typedef and define
What should testers know about login security?
(manual) [sqli labs27, 27a] error echo, Boolean blind injection, filtered injection
使用nfpm制作rpm包
POJ 2763 housewife wind (tree chain partition + edge weighting point weight)
MySQL(5)