当前位置:网站首页>【CVPR2022】Lite Vision Transformer with Enhanced Self-Attention
【CVPR2022】Lite Vision Transformer with Enhanced Self-Attention
2022-07-28 05:00:00 【AI frontier theory group @ouc】

The paper :https://readpaper.com/paper/633541619879256064
Code :https://github.com/Chenglin-Yang/LVT
1、 Research motivation
Even though ViT The model is effective in various visual tasks , But at present, lightweight ViT The effect of the model in local areas is not ideal , The author thinks that : Self attention mechanism has limitations in shallow Networks (Self-attention mechanism is limited in shallower and thinner networks). So , The author puts forward a kind of light yet effective vision transformer It can be applied to mobile devices (Lite Vision Transformer, LVT), With standard four-stage structure , But and MobileNetV2 and PVTv2-B0 Contains the same parameter quantity . The author mainly puts forward two new attention modular :Convolutional Self-Attention (CSA) and Recursive Atrous Self-Attention (RASA) . Here are the introduction CSA Module and RASA modular .

2、Convolutional Self-Attention (CSA)

The process is shown in the figure above , The basic process is :
- Calculation similarity( In the code attn): take (hw/4, c) The matrix of passes through 1x1 Convolution becomes (hw/4, k^2, k^2).
- Calculation V: Generate a (hw/4, c, k^2) Matrix , then reshape adopt 1x1 The convolution of changes the number of channels ( The picture shows BMM), obtain (hw/4, k^2, c_out) Matrix .
- Matrix multiplication ,similarity and v Multiply , obtain (hw/4, k^2, c_out)
- Use fold Transform to get output
In terms of code ,CSA The code ratio of VOLO More complicated , But it doesn't seem to be different in essence ( Maybe my understanding is not in place ). and , I feel CSA There's no VOLO concise . Interested can refer to 《VOLO: Vision Outlooker for Visual Recognition》 This paper and online code .
3、Recursive Atrous Self-Attention (RASA)

First introduced ASA, With the ordinary attention The difference in calculation is : The author is calculating Q when , Multiscale void convolution is used . Convolution weight sharing , Reduced parameters .
meanwhile , The author used recursive operation . Every block in ,ASA Iterate twice .
4、 experimental analysis
The Internet uses 4 Phase structure . The first stage uses CSA, Other stages use RASA.

stay ImageNet The experimental results show that , The number of Dangshen is the same as MobileNetV2 and PVTv2-B0 Quite a time , The accuracy of this method is significantly higher . meanwhile , Increase to and ResNet50 When the parameter quantity is close , The performance of this method significantly exceeds that of the current method .

For other parts, please refer to the author's paper , There's no more talk about .
边栏推荐
- Printf() print char* str
- 启发国内学子学习少儿机器人编程教育
- Easycvr Video Square snapshot adding device channel offline reason display
- Redux basic syntax
- Method of converting UI file to py file
- Design and development of C language ATM system project
- list indices must be integers or slices, not tuple
- RT_ Use of thread mailbox
- 猿辅导技术进化论:助力教与学 构想未来学校
- go-zero单体服务使用泛型简化注册Handler路由
猜你喜欢

Interview fraud: there are companies that make money from interviews
![[function document] torch Histc and paddle Histogram and numpy.histogram](/img/ee/ea918f79dc659369fde5394b333226.png)
[function document] torch Histc and paddle Histogram and numpy.histogram

Data security is gradually implemented, and we must pay close attention to the source of leakage

机器人教育在STEM课程中的设计研究

RT_ Use of thread message queue

Simulink automatically generates STM32 code details

Machine learning and deep learning -- normalization processing

05.01 string

What is the reason why the easycvr national standard protocol access equipment is online but the channel is not online?

App test process and test points
随机推荐
Leetcode 454. Adding four numbers II
Can plastics comply with gb/t 2408 - Determination of flammability
CPU and memory usage are too high. How to modify RTSP round robin detection parameters to reduce server consumption?
Is low code the future of development? On low code platform
POJ 2763 housewife wind (tree chain partition + edge weighting point weight)
[每日一氵]上古年代的 Visual Studio2015 安装
linux下安装mysql
Basic knowledge of network security - password (I)
Histogram of pyplot module of Matplotlib (hist(): basic parameter, return value)
Leetcode 18. sum of four numbers
Cloudcompare & PCL point cloud least square fitting plane
Use and expansion of fault tolerance and fusing
Rendering process, how the code becomes a page (I)
(3.1) [Trojan horse synthesis technology]
动态sql和分页
Evolution of ape counseling technology: helping teaching and learning conceive future schools
塑料可以执行GB/T 2408 -燃烧性能的测定吗
Leetcode 15. sum of three numbers
flink思维导图
字符串0123456789abcdef,子串(非空且非同串本身)的个数是多少【杭州多测师】【杭州多测师_王sir】...