当前位置:网站首页>ECCV 2022 lightweight model frame Parc net press apple mobilevit code and paper Download
ECCV 2022 lightweight model frame Parc net press apple mobilevit code and paper Download
2022-07-29 06:59:00 【AI vision netqi】
Test code :
224*224*3 1060 The graphics card gpu10 ms,cpu80ms.
gpu Than mobileone fast 1ms.
python eval_cls.py --common.config-file ./config/classification/edgeformer/edgeformer_s.yaml --model.classification.pretrained ./pretrained_models/classification/checkpoint_ema_avg.pt

Address of thesis :https://arxiv.org/abs/2203.03952
Code address :https://github.com/hkzhang91/ParC-Net
Computer Vision Institute column
author :Edison_G
Inherit ConvNet and Transformer Advantages of position sensitive cyclic convolution .
01 summary
lately ,vision transformers Beginning to show impressive results , Significantly better than models based on large convolutions . However , In the field of small models of mobile or resource constrained devices ,ConvNet It still has its own advantages in terms of performance and model complexity . The researchers put forward ParC-Net, This is a purely based on ConvNet The backbone model of , By way of vision transformers The advantages of ConvNet in , These advantages are further enhanced .

ConvNet And ViT Comparison of experimental results of model image classification
say concretely , Researchers have proposed position aware circular convolution (ParC), This is a lightweight convolution operation , It has a global receptive field , At the same time, it produces the same position sensitive features as local convolution . take ParCs and squeeze-exictation ops Combine to form a model block similar to the meta model , It also has features similar to transformers The attention mechanism of . The above blocks can be used in a plug and play manner , Replace ConvNets or transformers Related blocks in .
![]()
Experimental results show that , In common visual tasks and data sets , The proposed ParC-Net Than the popular lightweight ConvNets And based on vision transformers The model has better performance , At the same time, it has fewer parameters and faster reasoning speed . about ImageNet-1k Classification on ,ParC-Net At about 500 In the case of 10000 parameters 78.6% Of top-1 Accuracy rate , Saved 11% Parameters and 13% Calculated cost of , But the accuracy is improved 0.2%, The speed of reasoning has increased 23%( be based on ARM Of Rockchip RK3288) And MobileViT comparison , Use only 0.5 Times the parameter , But with DeIT Compared with 2.7% The accuracy of . stay MS-COCO Target detection and PASCAL VOC Split tasks ,ParC-Net Better performance .
02
background
However , We think ViTs and ConvNets Are indispensable , Here's why :
1) From an application point of view ,ViTs and ConvNets Both have their advantages and disadvantages .ViT Models usually have better performance , But usually the calculation cost is high and it is difficult to train . And ViTs comparison ,ConvNets May show poor performance , But they still have some unique advantages . for example ,ConvNets It has better hardware support and is easy to train . Besides , just as [Jianyuan Guo, Kai Han, Han Wu, Chang Xu, Yehui Tang, Chunjing Xu, and Yunhe Wang. Cmt: Convolutional neural networks meet vision transformers.] As summarized in the researcher's experiment ,ConvNets It is still dominant in the field of small models of mobile or edge devices .
2) From the perspective of information processing ,ViTs and ConvNets Have unique characteristics .ViT Good at extracting global information , And use attention mechanism to extract information from different positions driven by input data .ConvNets Focus on modeling local relationships , And through inductive bias has a strong priori . The above analysis naturally raises a question : Can we ask ViT Learn to improve for mobile or edge computing applications ConvNet?
ViT The paper :https://arxiv.org/abs/2010.11929
ConvNeXt The paper :https://arxiv.org/abs/2201.03545
03
New framework
Researchers take ViT Three bright spots , Make the pure convolution structure stronger . Researchers believe ,ViT and ConvNet There are three main differences :ViT Better at extracting global features , use meta-former structure , And information integration is data driven .ParC Our design idea is to optimize from these three points ConvNet.
![]()
Ordinary ConvNet and ViT Three main differences between .a)ConvNet frequently-used Residual block;b)ViT Commonly used Meta-Former structure ;c) The researchers put forward ParC block.
To be specific , Researchers have designed a position information sensitive cyclic convolution (Position aware circular convolution, ParC). This is a simple and effective lightweight convolution operator , Both have image ViT Global receptive field of class structure , At the same time, it produces position sensitive features like local convolution , It can overcome the problem of relying on self attention structure to extract global features .

Position aware circular convolution

Global cyclic convolution in horizontal direction
You can see ParC-H Convolution is performed along the circle generated by connecting the beginning and end of the input . therefore , Researchers named the proposed convolution as cyclic convolution . Proposed ParC Three modifications have been introduced :
combination circular padding And large receptive field low rank decomposition convolution kernel to extract global features ;
Insert location , Ensure the sensitivity of output features to spatial location information ;
Dynamic interpolation generates size adaptive convolution kernel and position coding in real time , Deal with the change of input resolution , This enhances the adaptability to different size inputs .
The researchers will also ParC and squeeze exictation Combine , Constructed a pure convolution structure meta former structure . This structure discards the unfriendly operation supported by self attention hardware , But the tradition is preserved Transformer Features of block extraction of global features . The researchers are still channel mixer Part of the introduction of hardware support for a more friendly channel attention mechanism , Make it pure convolution meta former Structure also has the characteristics of self attention .
be based on ParC The final result of the structure ParC block , It can be used as a plug and play basic unit , Replace an existing ViT or ConvNet Related blocks in the model , So as to improve the accuracy , And reduce computing costs , Effectively overcome the problem of hardware support .
![]()
Three main hybrid structures .(a) serial structure; (b) parallel structure; (c) bifurcate structure.
04
experimental analysis
![]()
In the experiment of image classification , about ImageNet-1k The classification of ,ParC-Net The parameter scale used is the smallest ( about 500 All the parameters ), But it did Highest accuracy 78.6%.
![]()
MobileViT yes Apple2022 International in-depth learning summit ICLR22 Lightweight general ViT Model . Also deployed on Arm Ruixin micro RK3288 On chip , Compared with the baseline model MobileViT,ParC-Net Saved 11% Parameters and 13% Calculated cost of , At the same time, the accuracy is improved 0.2%, The speed of reasoning has increased 23%.
![]()
MS-COCO Object detection experiment results
![]()
PASCAL VOC Segmentation task experimental results
![]()
The researchers will ParC-Net And baseline models MobileVit Are deployed to self-developed low-power chips DP Test the reasoning speed on . From the experimental results, we can see ,ParC-Net The reasoning speed can reach MobileViT Fast 3~4 times .
THE END
Please contact the official account for authorization.
![]()
边栏推荐
- MySql基础知识(高频面试题)
- Not so simple singleton mode
- HJ37 统计每个月兔子的总数 斐波那契数列
- Teacher wangshuyao's notes on operations research 01 guidance and introduction
- 【CryoEM】FSC, Fourier Shell Correlation简介
- The latest pycharm2018 cracking tutorial
- 数据库多表查询 联合查询 增删改查
- 数据库持久化+JDBC数据库连接
- 谷歌零碎笔记之JWT(草稿)
- Implementation of DDP cluster distributed training under pytoch multi GPU conditions (brief introduction - from scratch)
猜你喜欢

【CryoEM】FSC, Fourier Shell Correlation简介

IO流 - File - properties

Unity探索地块通路设计分析 & 流程+代码具体实现

CVPR2022Oral专题系列(一):低光增强

Joint modeling of price preference and interest preference in conversation recommendation - extensive reading of papers

LDAP brief description and unified authentication description

线程 - 线程安全 - 线程优化

Recurrent neural network RNN

Share some tips for better code, smooth coding and improve efficiency

leetcode-592:分数加减运算
随机推荐
线程同步—— 生产者与消费者、龟兔赛跑、双线程打印
Teacher wangshuyao's notes on operations research 04 fundamentals of linear algebra
SSH免密登录-两台虚拟机建立免密通道 双向信任
Etcd principle
Simulation volume leetcode [normal] 222. number of nodes of complete binary tree
Thread - thread safety - thread optimization
Security in quantum machine learning
模拟卷Leetcode【普通】061. 旋转链表
Loss function -- cross entropy loss function
Unity免费元素特效推荐
SDN topology discovery principle
网上传说软件测试培训真的那么黑心吗?都是骗局?
Shallow reading of condition object source code
Leetcode-592: fraction addition and subtraction
Database multi table query joint query add delete modify query
Why does 5g N2 interface control plane use SCTP protocol?
好文佳句摘录
【冷冻电镜|论文阅读】emClarity:用于高分辨率冷冻电子断层扫描和子断层平均的软件
Unity探索地块通路设计分析 & 流程+代码具体实现
王树尧老师运筹学课程笔记 03 KKT定理