当前位置:网站首页>【ARXIV2205】EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers
【ARXIV2205】EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers
2022-07-28 05:01:00 【AI frontier theory group @ouc】

【ARXIV2205】EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers
The paper :https://arxiv.org/abs/2205.03436
Code :https://github.com/1hunters/EdgeViT
Vision based on self attention mechanism Transformer(ViT) On visual tasks , Has formed and CNN The same powerful architecture , But its calculation and model size are very large . Although some work is done by introducing prior information or cascading multistage structures to ViT in , But it's still not efficient enough on mobile devices . This paper is based on MobileNetV2 Lightweight ViT, By introducing local - overall situation - Local (LGL) Of bottleneck Realization , It combines attention mechanisms with CNN The advantages of .
The author proposes that VIT Three issues need to be considered when applying the model to mobile terminals :
- 1) Reasoning speed should be fast . Some current indicators such as FLOPs It is difficult to reflect the speed of the model at the mobile end , Because memory access speed 、 Parallelism and other factors need to be comprehensively considered .
- 2) The model can be large . Current mobile phones can have 32GB Of memory , The storage model should not be a limiting factor .
- 3) Achieved friendliness . SWIN Inside cyclic shift It is not convenient to realize on the mobile end , Therefore, the model should consider whether it is convenient to implement in the mobile terminal .
Guided by the above three principles , The author puts forward EdgeViTs, Designed an efficient Local - overall situation - Local (LGL) modular , It can achieve better accuracy and calculation efficiency .

The model is shown above , The point is LGL modular , It includes three key parts :
- local aggregation: By convolution sum depth conv form
- global sparse attention: Calculate attention after average pooling
- local propagation: Use deconvolution to restore the reduced feature map to its original size .
Let's see the specific code , It's not difficult to understand .
class LocalAgg():
def __init__(self, dim):
self.conv1 = Conv2d(dim, dim, 1)
self.conv2 = Conv2d(im, dim, 3, padding=1, groups=dim)
self.conv3 = Conv2d(dim, dim, 1)
self.norm1 = BatchNorm2d(dim)
self.norm2 = BatchNorm2d(dim)
forward(self, x):
x = self.conv1(self.norm1(x))
x = self.conv2(x)
x = self.conv3(self.norm2(x))
return x
class GlobalSparseAttn():
def __init__(self, dim, sample_rate, scale):
self.scale = scale
self.qkv = Linear(dim, dim * 3)
self.sampler = AvgPool2d(1, stride=sample_rate)
kernel_size=sr_ratio
self.LocalProp = ConvTranspose2d(dim, dim, kernel_size, stride=sample_rate, groups=dim
)
self.norm = LayerNorm(dim)
self.proj = Linear(dim, dim)
def forward(self, x):
x = self.sampler(x)
q, k, v = self.qkv(x)
attn = q @ k * self.scale
attn = attn.softmax(dim=-1)
x = attn @ v
x = self.LocalProp(x)
x = self.proj(self.norm(x))
return x
Actually , The whole network is based on CNN Of , It's just used SWIN The typical architecture of . The experimental results are shown in the following table . Although the author said , And MobileViTs comparison ,EdgeViTs Under three kinds of complexity settings 5.4%、2.8% and 2.7% Raise , But I feel from FLOPs And other indicators , There is no advantage of the dating . Here is my personal understanding , Where there are different opinions, you can communicate at any time .

边栏推荐
- Redis configuration file explanation / parameter explanation and elimination strategy
- The difference between alter and confirm, prompt
- FreeRTOS个人笔记-任务通知
- When initializing with pyqt5, super() and_ init _ () problems faced by the coordinated use of functions, as well as the corresponding learning and solutions
- 微服务故障模式与构建弹性系统
- 使用nfpm制作rpm包
- HDU 1435 stable match
- 多御安全浏览器将改进安全模式,让用户浏览更安全
- MySQL 默认隔离级别是RR,为什么阿里等大厂会改成RC?
- Easycvr Video Square snapshot adding device channel offline reason display
猜你喜欢

The go zero singleton service uses generics to simplify the registration of handler routes

Youxuan database participated in the compilation of the Research Report on database development (2022) of the China Academy of communications and communications

Rendering process, how the code becomes a page (I)

What is the reason why the easycvr national standard protocol access equipment is online but the channel is not online?

CPU and memory usage are too high. How to modify RTSP round robin detection parameters to reduce server consumption?

基于MPLS构建虚拟专网的配置实验

RT_ Use of thread message queue
![[Hongke technology] Application of network Multimeter in data center](/img/28/2ecc5a7a766454968819c7748fe48e.png)
[Hongke technology] Application of network Multimeter in data center

Real intelligence has been certified by two of the world's top market research institutions and has entered the global camp of excellence

After a year of unemployment, I learned to do cross-border e-commerce and earned 520000. Only then did I know that going to work really delayed making money!
随机推荐
POJ 3417 network (lca+ differential on tree)
RT_ Use of thread message queue
为什么md5不可逆,却还可能被md5免费解密网站解密
Data imbalance: comprehensive sampling of anti fraud model (data imbalance)
【ARXIV2204】Vision Transformers for Single Image Dehazing
Depth traversal and breadth traversal of tree structure in JS
How to quickly locate bugs? How to write test cases?
Leetcode 454. Adding four numbers II
吉利AI面试题【杭州多测师】【杭州多测师_王sir】
Clickhouse填坑记2:Join条件不支持大于、小于等非等式判断
Redux basic syntax
Testcafe provides automatic waiting mechanism and live operation mode
Mysql database -- first knowledge database
[Oracle] 083 wrong question set
Geely AI interview question [Hangzhou multi tester] [Hangzhou multi tester _ Wang Sir]
Online sql to XML tool
Gan: generative advantageous nets -- paper analysis and the mathematical concepts behind it
多御安全浏览器将改进安全模式,让用户浏览更安全
Method of converting UI file to py file
Data security is gradually implemented, and we must pay close attention to the source of leakage