当前位置:网站首页>ECCV 2022 lightweight model frame Parc net press apple mobilevit code and paper Download
ECCV 2022 lightweight model frame Parc net press apple mobilevit code and paper Download
2022-07-29 06:59:00 【AI vision netqi】
Test code :
224*224*3 1060 The graphics card gpu10 ms,cpu80ms.
gpu Than mobileone fast 1ms.
python eval_cls.py --common.config-file ./config/classification/edgeformer/edgeformer_s.yaml --model.classification.pretrained ./pretrained_models/classification/checkpoint_ema_avg.pt

Address of thesis :https://arxiv.org/abs/2203.03952
Code address :https://github.com/hkzhang91/ParC-Net
Computer Vision Institute column
author :Edison_G
Inherit ConvNet and Transformer Advantages of position sensitive cyclic convolution .
01 summary
lately ,vision transformers Beginning to show impressive results , Significantly better than models based on large convolutions . However , In the field of small models of mobile or resource constrained devices ,ConvNet It still has its own advantages in terms of performance and model complexity . The researchers put forward ParC-Net, This is a purely based on ConvNet The backbone model of , By way of vision transformers The advantages of ConvNet in , These advantages are further enhanced .

ConvNet And ViT Comparison of experimental results of model image classification
say concretely , Researchers have proposed position aware circular convolution (ParC), This is a lightweight convolution operation , It has a global receptive field , At the same time, it produces the same position sensitive features as local convolution . take ParCs and squeeze-exictation ops Combine to form a model block similar to the meta model , It also has features similar to transformers The attention mechanism of . The above blocks can be used in a plug and play manner , Replace ConvNets or transformers Related blocks in .
![]()
Experimental results show that , In common visual tasks and data sets , The proposed ParC-Net Than the popular lightweight ConvNets And based on vision transformers The model has better performance , At the same time, it has fewer parameters and faster reasoning speed . about ImageNet-1k Classification on ,ParC-Net At about 500 In the case of 10000 parameters 78.6% Of top-1 Accuracy rate , Saved 11% Parameters and 13% Calculated cost of , But the accuracy is improved 0.2%, The speed of reasoning has increased 23%( be based on ARM Of Rockchip RK3288) And MobileViT comparison , Use only 0.5 Times the parameter , But with DeIT Compared with 2.7% The accuracy of . stay MS-COCO Target detection and PASCAL VOC Split tasks ,ParC-Net Better performance .
02
background
However , We think ViTs and ConvNets Are indispensable , Here's why :
1) From an application point of view ,ViTs and ConvNets Both have their advantages and disadvantages .ViT Models usually have better performance , But usually the calculation cost is high and it is difficult to train . And ViTs comparison ,ConvNets May show poor performance , But they still have some unique advantages . for example ,ConvNets It has better hardware support and is easy to train . Besides , just as [Jianyuan Guo, Kai Han, Han Wu, Chang Xu, Yehui Tang, Chunjing Xu, and Yunhe Wang. Cmt: Convolutional neural networks meet vision transformers.] As summarized in the researcher's experiment ,ConvNets It is still dominant in the field of small models of mobile or edge devices .
2) From the perspective of information processing ,ViTs and ConvNets Have unique characteristics .ViT Good at extracting global information , And use attention mechanism to extract information from different positions driven by input data .ConvNets Focus on modeling local relationships , And through inductive bias has a strong priori . The above analysis naturally raises a question : Can we ask ViT Learn to improve for mobile or edge computing applications ConvNet?
ViT The paper :https://arxiv.org/abs/2010.11929
ConvNeXt The paper :https://arxiv.org/abs/2201.03545
03
New framework
Researchers take ViT Three bright spots , Make the pure convolution structure stronger . Researchers believe ,ViT and ConvNet There are three main differences :ViT Better at extracting global features , use meta-former structure , And information integration is data driven .ParC Our design idea is to optimize from these three points ConvNet.
![]()
Ordinary ConvNet and ViT Three main differences between .a)ConvNet frequently-used Residual block;b)ViT Commonly used Meta-Former structure ;c) The researchers put forward ParC block.
To be specific , Researchers have designed a position information sensitive cyclic convolution (Position aware circular convolution, ParC). This is a simple and effective lightweight convolution operator , Both have image ViT Global receptive field of class structure , At the same time, it produces position sensitive features like local convolution , It can overcome the problem of relying on self attention structure to extract global features .

Position aware circular convolution

Global cyclic convolution in horizontal direction
You can see ParC-H Convolution is performed along the circle generated by connecting the beginning and end of the input . therefore , Researchers named the proposed convolution as cyclic convolution . Proposed ParC Three modifications have been introduced :
combination circular padding And large receptive field low rank decomposition convolution kernel to extract global features ;
Insert location , Ensure the sensitivity of output features to spatial location information ;
Dynamic interpolation generates size adaptive convolution kernel and position coding in real time , Deal with the change of input resolution , This enhances the adaptability to different size inputs .
The researchers will also ParC and squeeze exictation Combine , Constructed a pure convolution structure meta former structure . This structure discards the unfriendly operation supported by self attention hardware , But the tradition is preserved Transformer Features of block extraction of global features . The researchers are still channel mixer Part of the introduction of hardware support for a more friendly channel attention mechanism , Make it pure convolution meta former Structure also has the characteristics of self attention .
be based on ParC The final result of the structure ParC block , It can be used as a plug and play basic unit , Replace an existing ViT or ConvNet Related blocks in the model , So as to improve the accuracy , And reduce computing costs , Effectively overcome the problem of hardware support .
![]()
Three main hybrid structures .(a) serial structure; (b) parallel structure; (c) bifurcate structure.
04
experimental analysis
![]()
In the experiment of image classification , about ImageNet-1k The classification of ,ParC-Net The parameter scale used is the smallest ( about 500 All the parameters ), But it did Highest accuracy 78.6%.
![]()
MobileViT yes Apple2022 International in-depth learning summit ICLR22 Lightweight general ViT Model . Also deployed on Arm Ruixin micro RK3288 On chip , Compared with the baseline model MobileViT,ParC-Net Saved 11% Parameters and 13% Calculated cost of , At the same time, the accuracy is improved 0.2%, The speed of reasoning has increased 23%.
![]()
MS-COCO Object detection experiment results
![]()
PASCAL VOC Segmentation task experimental results
![]()
The researchers will ParC-Net And baseline models MobileVit Are deployed to self-developed low-power chips DP Test the reasoning speed on . From the experimental results, we can see ,ParC-Net The reasoning speed can reach MobileViT Fast 3~4 times .
THE END
Please contact the official account for authorization.
![]()
边栏推荐
- Teacher wangshuyao's notes on operations research course 08 linear programming and simplex method (simplex method)
- Excerpts from good essays
- 【冷冻电镜|论文阅读】子断层平均 M 软件解读:Multi-particle cryo-EM refinement with M
- 【冷冻电镜】RELION4.0 pipeline命令总结(自用)
- Simulation volume leetcode [general] 150. evaluation of inverse Polish expression
- 王树尧老师运筹学课程笔记 06 线性规划与单纯形法(几何意义)
- Dbasql interview questions
- 模拟卷Leetcode【普通】222. 完全二叉树的节点个数
- Mutual conversion between Base64 and file
- ECCV 2022丨轻量级模型架ParC-Net 力压苹果MobileViT代码和论文下载
猜你喜欢

王树尧老师运筹学课程笔记 04 线性代数基础

Unity探索地块通路设计分析 & 流程+代码具体实现

Why does 5g N2 interface control plane use SCTP protocol?

STP spanning tree principle and example of election rules

二次元卡通渲染——进阶技巧

N2 interface of 5g control plane protocol

矩阵分解与梯度下降

【冷冻电镜|论文阅读】子断层平均 M 软件解读:Multi-particle cryo-EM refinement with M

Embedding understanding + code

【冷冻电镜】RELION4.0之subtomogram对位功能源码分析(自用)
随机推荐
Mutual conversion between Base64 and file
数据单位:位、字节、字、字长
【冷冻电镜】RELION4.0之subtomogram对位功能源码分析(自用)
数据库系统概述
王树尧老师运筹学课程笔记 05 线性规划与单纯形法(概念、建模、标准型)
Analysis of four isolation levels of MySQL things
【技能积累】写邮件时的常用表达
王树尧老师运筹学课程笔记 06 线性规划与单纯形法(几何意义)
Teacher wangshuyao's notes on operations research 02 fundamentals of advanced mathematics
STP spanning tree principle and example of election rules
Sword finger offer II 115: reconstruction sequence
新同事写了几段小代码,把系统给搞崩了,被老板爆怼一顿!
Teacher Wu Enda's machine learning course notes 04 multiple linear regression
王树尧老师运筹学课程笔记 01 导学与绪论
阿里一面,给了几条SQL,问需要执行几次树搜索操作?
Dbasql interview questions
Apisik health check test
Teacher wangshuyao's notes on operations research 03 KKT theorem
Biased lock, lightweight lock test tool class level related commands
剑指 Offer II 115:重建序列