当前位置:网站首页>【CVPR2022】Lite Vision Transformer with Enhanced Self-Attention
【CVPR2022】Lite Vision Transformer with Enhanced Self-Attention
2022-07-28 05:00:00 【AI frontier theory group @ouc】

The paper :https://readpaper.com/paper/633541619879256064
Code :https://github.com/Chenglin-Yang/LVT
1、 Research motivation
Even though ViT The model is effective in various visual tasks , But at present, lightweight ViT The effect of the model in local areas is not ideal , The author thinks that : Self attention mechanism has limitations in shallow Networks (Self-attention mechanism is limited in shallower and thinner networks). So , The author puts forward a kind of light yet effective vision transformer It can be applied to mobile devices (Lite Vision Transformer, LVT), With standard four-stage structure , But and MobileNetV2 and PVTv2-B0 Contains the same parameter quantity . The author mainly puts forward two new attention modular :Convolutional Self-Attention (CSA) and Recursive Atrous Self-Attention (RASA) . Here are the introduction CSA Module and RASA modular .

2、Convolutional Self-Attention (CSA)

The process is shown in the figure above , The basic process is :
- Calculation similarity( In the code attn): take (hw/4, c) The matrix of passes through 1x1 Convolution becomes (hw/4, k^2, k^2).
- Calculation V: Generate a (hw/4, c, k^2) Matrix , then reshape adopt 1x1 The convolution of changes the number of channels ( The picture shows BMM), obtain (hw/4, k^2, c_out) Matrix .
- Matrix multiplication ,similarity and v Multiply , obtain (hw/4, k^2, c_out)
- Use fold Transform to get output
In terms of code ,CSA The code ratio of VOLO More complicated , But it doesn't seem to be different in essence ( Maybe my understanding is not in place ). and , I feel CSA There's no VOLO concise . Interested can refer to 《VOLO: Vision Outlooker for Visual Recognition》 This paper and online code .
3、Recursive Atrous Self-Attention (RASA)

First introduced ASA, With the ordinary attention The difference in calculation is : The author is calculating Q when , Multiscale void convolution is used . Convolution weight sharing , Reduced parameters .
meanwhile , The author used recursive operation . Every block in ,ASA Iterate twice .
4、 experimental analysis
The Internet uses 4 Phase structure . The first stage uses CSA, Other stages use RASA.

stay ImageNet The experimental results show that , The number of Dangshen is the same as MobileNetV2 and PVTv2-B0 Quite a time , The accuracy of this method is significantly higher . meanwhile , Increase to and ResNet50 When the parameter quantity is close , The performance of this method significantly exceeds that of the current method .

For other parts, please refer to the author's paper , There's no more talk about .
边栏推荐
- [learning record] data enhancement 1
- Service object creation and use
- 字符串0123456789abcdef,子串(非空且非同串本身)的个数是多少【杭州多测师】【杭州多测师_王sir】...
- Printf() print char* str
- FreeRTOS startup process, coding style and debugging method
- [idea] check out master invalid path problem
- Interview fraud: there are companies that make money from interviews
- (3.1) [Trojan horse synthesis technology]
- Installing MySQL under Linux
- Research on the design of robot education in stem course
猜你喜欢

Use animatedbuilder to separate components and animation, and realize dynamic reuse

Use and expansion of fault tolerance and fusing

(克隆虚拟机步骤)

Activation functions sigmoid, tanh, relu in convolutional neural networks

Method of converting UI file to py file

C语言ATM自动取款机系统项目的设计与开发

How to send and receive reports through outlook in FastReport VCL?

Configuration experiment of building virtual private network based on MPLS

After easycvr is connected to the national standard equipment, how to solve the problem that the equipment video cannot be played completely?

如何在 FastReport VCL 中通过 Outlook 发送和接收报告?
随机推荐
Program life | how to switch to software testing? (software testing learning roadmap attached)
Redis configuration file explanation / parameter explanation and elimination strategy
为什么md5不可逆,却还可能被md5免费解密网站解密
Is low code the future of development? On low code platform
HDU 1530 maximum clique
With a monthly salary of 15.5K, he failed to start a business and was heavily in debt. How did he reverse the trend through software testing?
HDU 2586 How far away ? (LCA multiplication method)
[high CPU consumption] software_ reporter_ tool.exe
The first artificial intelligence security competition starts. Three competition questions are waiting for you to fight
App test process and test points
Do you know several assertion methods commonly used by JMeter?
Activation functions sigmoid, tanh, relu in convolutional neural networks
(manual) [sqli labs27, 27a] error echo, Boolean blind injection, filtered injection
Angr (XI) - official document (Part2)
Rendering process, how the code becomes a page (2)
数据库故障容错之系统时钟故障
MySQL 默认隔离级别是RR,为什么阿里等大厂会改成RC?
POJ 3728 the merchant (online query + double LCA)
[idea] check out master invalid path problem
After easycvr is connected to the national standard equipment, how to solve the problem that the equipment video cannot be played completely?