当前位置:网站首页>【ARXIV2205】EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers
【ARXIV2205】EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers
2022-07-28 05:01:00 【AI frontier theory group @ouc】

【ARXIV2205】EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers
The paper :https://arxiv.org/abs/2205.03436
Code :https://github.com/1hunters/EdgeViT
Vision based on self attention mechanism Transformer(ViT) On visual tasks , Has formed and CNN The same powerful architecture , But its calculation and model size are very large . Although some work is done by introducing prior information or cascading multistage structures to ViT in , But it's still not efficient enough on mobile devices . This paper is based on MobileNetV2 Lightweight ViT, By introducing local - overall situation - Local (LGL) Of bottleneck Realization , It combines attention mechanisms with CNN The advantages of .
The author proposes that VIT Three issues need to be considered when applying the model to mobile terminals :
- 1) Reasoning speed should be fast . Some current indicators such as FLOPs It is difficult to reflect the speed of the model at the mobile end , Because memory access speed 、 Parallelism and other factors need to be comprehensively considered .
- 2) The model can be large . Current mobile phones can have 32GB Of memory , The storage model should not be a limiting factor .
- 3) Achieved friendliness . SWIN Inside cyclic shift It is not convenient to realize on the mobile end , Therefore, the model should consider whether it is convenient to implement in the mobile terminal .
Guided by the above three principles , The author puts forward EdgeViTs, Designed an efficient Local - overall situation - Local (LGL) modular , It can achieve better accuracy and calculation efficiency .

The model is shown above , The point is LGL modular , It includes three key parts :
- local aggregation: By convolution sum depth conv form
- global sparse attention: Calculate attention after average pooling
- local propagation: Use deconvolution to restore the reduced feature map to its original size .
Let's see the specific code , It's not difficult to understand .
class LocalAgg():
def __init__(self, dim):
self.conv1 = Conv2d(dim, dim, 1)
self.conv2 = Conv2d(im, dim, 3, padding=1, groups=dim)
self.conv3 = Conv2d(dim, dim, 1)
self.norm1 = BatchNorm2d(dim)
self.norm2 = BatchNorm2d(dim)
forward(self, x):
x = self.conv1(self.norm1(x))
x = self.conv2(x)
x = self.conv3(self.norm2(x))
return x
class GlobalSparseAttn():
def __init__(self, dim, sample_rate, scale):
self.scale = scale
self.qkv = Linear(dim, dim * 3)
self.sampler = AvgPool2d(1, stride=sample_rate)
kernel_size=sr_ratio
self.LocalProp = ConvTranspose2d(dim, dim, kernel_size, stride=sample_rate, groups=dim
)
self.norm = LayerNorm(dim)
self.proj = Linear(dim, dim)
def forward(self, x):
x = self.sampler(x)
q, k, v = self.qkv(x)
attn = q @ k * self.scale
attn = attn.softmax(dim=-1)
x = attn @ v
x = self.LocalProp(x)
x = self.proj(self.norm(x))
return x
Actually , The whole network is based on CNN Of , It's just used SWIN The typical architecture of . The experimental results are shown in the following table . Although the author said , And MobileViTs comparison ,EdgeViTs Under three kinds of complexity settings 5.4%、2.8% and 2.7% Raise , But I feel from FLOPs And other indicators , There is no advantage of the dating . Here is my personal understanding , Where there are different opinions, you can communicate at any time .

边栏推荐
- The go zero singleton service uses generics to simplify the registration of handler routes
- Tips for using swiper (1)
- Pipe /createpipe
- HashSet add
- Summary and review of puppeter
- Activation functions sigmoid, tanh, relu in convolutional neural networks
- Analysis of the reason why easycvr service can't be started and tips for dealing with easy disk space filling
- When initializing with pyqt5, super() and_ init _ () problems faced by the coordinated use of functions, as well as the corresponding learning and solutions
- After easycvr is connected to the national standard equipment, how to solve the problem that the equipment video cannot be played completely?
- 如何在 FastReport VCL 中通过 Outlook 发送和接收报告?
猜你喜欢
![[Oracle] 083 wrong question set](/img/10/9a5dae9542a8fed0356843c59f3c2f.png)
[Oracle] 083 wrong question set

The default isolation level of MySQL is RR. Why does Alibaba and other large manufacturers change to RC?

Online sql to XML tool

Redis type

Driving the powerful functions of EVM and xcm, how subwallet enables Boca and moonbeam

Rendering process, how the code becomes a page (2)

Summary and review of puppeter
![String 0123456789abcdef, what is the number of substrings (not empty and not the same string itself) [Hangzhou multi tester] [Hangzhou multi tester _ Wang Sir]](/img/78/efe3d70a4bfe8ac0c9b58b54d02b00.png)
String 0123456789abcdef, what is the number of substrings (not empty and not the same string itself) [Hangzhou multi tester] [Hangzhou multi tester _ Wang Sir]

FreeRTOS个人笔记-任务通知

The first artificial intelligence security competition starts. Three competition questions are waiting for you to fight
随机推荐
Leetcode 18. sum of four numbers
Cloudcompare & PCL point cloud least square fitting plane
[function document] torch Histc and paddle Histogram and numpy.histogram
【CVPR2022】Lite Vision Transformer with Enhanced Self-Attention
CPU and memory usage are too high. How to modify RTSP round robin detection parameters to reduce server consumption?
Mysql database -- first knowledge database
[learning record] data enhancement 1
Leetcode 15. sum of three numbers
With a monthly salary of 15.5K, he failed to start a business and was heavily in debt. How did he reverse the trend through software testing?
[high CPU consumption] software_ reporter_ tool.exe
多御安全浏览器将改进安全模式,让用户浏览更安全
Inspire domestic students to learn robot programming education for children
Introduction to testcafe
Testcafe's positioning, operation of page elements, and verification of execution results
(manual) [sqli labs27, 27a] error echo, Boolean blind injection, filtered injection
阿里怎么用DDD来拆分微服务?
Gan: generative advantageous nets -- paper analysis and the mathematical concepts behind it
Redis configuration file explanation / parameter explanation and elimination strategy
阿里巴巴面试题【杭州多测师】【杭州多测师_王sir】
Redis配置文件详解/参数详解及淘汰策略