当前位置:网站首页>【CVPR2022】Lite Vision Transformer with Enhanced Self-Attention
【CVPR2022】Lite Vision Transformer with Enhanced Self-Attention
2022-07-28 05:00:00 【AI frontier theory group @ouc】

The paper :https://readpaper.com/paper/633541619879256064
Code :https://github.com/Chenglin-Yang/LVT
1、 Research motivation
Even though ViT The model is effective in various visual tasks , But at present, lightweight ViT The effect of the model in local areas is not ideal , The author thinks that : Self attention mechanism has limitations in shallow Networks (Self-attention mechanism is limited in shallower and thinner networks). So , The author puts forward a kind of light yet effective vision transformer It can be applied to mobile devices (Lite Vision Transformer, LVT), With standard four-stage structure , But and MobileNetV2 and PVTv2-B0 Contains the same parameter quantity . The author mainly puts forward two new attention modular :Convolutional Self-Attention (CSA) and Recursive Atrous Self-Attention (RASA) . Here are the introduction CSA Module and RASA modular .

2、Convolutional Self-Attention (CSA)

The process is shown in the figure above , The basic process is :
- Calculation similarity( In the code attn): take (hw/4, c) The matrix of passes through 1x1 Convolution becomes (hw/4, k^2, k^2).
- Calculation V: Generate a (hw/4, c, k^2) Matrix , then reshape adopt 1x1 The convolution of changes the number of channels ( The picture shows BMM), obtain (hw/4, k^2, c_out) Matrix .
- Matrix multiplication ,similarity and v Multiply , obtain (hw/4, k^2, c_out)
- Use fold Transform to get output
In terms of code ,CSA The code ratio of VOLO More complicated , But it doesn't seem to be different in essence ( Maybe my understanding is not in place ). and , I feel CSA There's no VOLO concise . Interested can refer to 《VOLO: Vision Outlooker for Visual Recognition》 This paper and online code .
3、Recursive Atrous Self-Attention (RASA)

First introduced ASA, With the ordinary attention The difference in calculation is : The author is calculating Q when , Multiscale void convolution is used . Convolution weight sharing , Reduced parameters .
meanwhile , The author used recursive operation . Every block in ,ASA Iterate twice .
4、 experimental analysis
The Internet uses 4 Phase structure . The first stage uses CSA, Other stages use RASA.

stay ImageNet The experimental results show that , The number of Dangshen is the same as MobileNetV2 and PVTv2-B0 Quite a time , The accuracy of this method is significantly higher . meanwhile , Increase to and ResNet50 When the parameter quantity is close , The performance of this method significantly exceeds that of the current method .

For other parts, please refer to the author's paper , There's no more talk about .
边栏推荐
- Interview fraud: there are companies that make money from interviews
- [Hongke technology] Application of network Multimeter in data center
- Redis配置文件详解/参数详解及淘汰策略
- The first artificial intelligence security competition starts. Three competition questions are waiting for you to fight
- Domain name (subdomain name) collection method of Web penetration
- Use and expansion of fault tolerance and fusing
- Angr(十一)——官方文档(Part2)
- HDU 3592 World Exhibition (differential constraint)
- Euler road / Euler circuit
- POJ 3417 network (lca+ differential on tree)
猜你喜欢
![[paper notes] - low illumination image enhancement - zeroshot - rrdnet Network - 2020-icme](/img/e3/f9c6bfdbcd5dffd406e3f1d2331050.png)
[paper notes] - low illumination image enhancement - zeroshot - rrdnet Network - 2020-icme

What SaaS architecture design do you need to know?

Introduction to testcafe

The default isolation level of MySQL is RR. Why does Alibaba and other large manufacturers change to RC?

Redis type

Read the paper -- a CNN RNN framework for clip yield prediction

Interview fraud: there are companies that make money from interviews

Use and expansion of fault tolerance and fusing

Evolution of ape counseling technology: helping teaching and learning conceive future schools

Improving the readability of UI layer test with puppeter
随机推荐
[daily one] visual studio2015 installation in ancient times
HDU 1914 the stable marriage problem
Program life | how to switch to software testing? (software testing learning roadmap attached)
Transformer -- Analysis and application of attention model
[learning record] data enhancement 1
Clickhouse填坑记2:Join条件不支持大于、小于等非等式判断
使用nfpm制作rpm包
[daily question 1] 735. Planetary collision
Clickhouse pit filling note 2: the join condition does not support non equal judgments such as greater than and less than
HDU 1435 stable match
Do you know several assertion methods commonly used by JMeter?
字符串0123456789abcdef,子串(非空且非同串本身)的个数是多少【杭州多测师】【杭州多测师_王sir】...
excel实战应用案例100讲(十一)-Excel插入图片小技巧
list indices must be integers or slices, not tuple
POJ 3728 the merchant (online query + double LCA)
The first artificial intelligence security competition starts. Three competition questions are waiting for you to fight
Array or object, date operation
[paper notes] - low illumination image enhancement - zeroshot - rrdnet Network - 2020-icme
Mysql database -- first knowledge database
Summary and review of puppeter