当前位置:网站首页>Adavit -- dynamic network with adaptive selection of computing structure
Adavit -- dynamic network with adaptive selection of computing structure
2022-07-06 22:29:00 【Law-Yao】
Paper Address :https://arxiv.org/abs/2111.15668
GitHub link :GitHub - MengLcool/AdaViT: Official implementation of AdaViT
Methods
ViT Based on the characteristics or advantages of its own structure , Have a good Abstract semantic expression or feature representation ability :
- adopt Attention Calculation , Realize the coding of global correlation information ;
- adopt Multi-head Attention, Further realize the feature abstraction and fusion of different sub representation spaces ;
- Through deep level Transformer layer Stack of , Further realize feature abstraction ;
However , For samples with different degrees of difficulty ,ViT Actual calculation required Patch Number 、Attention head The number or network layers can be different , Therefore, it can be constructed Conditional calculation of sample driven form (Sample-driven conditional computation).
AdaViT By designing dynamic network structure , According to the difficulty of inputting samples 、 Adaptively select the best computing structure , Include Patch selection、Attention head selection as well as Block selection, The specific method is described as follows :
- Decision network: Every Transformer layer There will be a Decision networks ( It consists of three linear layers ), The input of the decision network is current Transformer layer The input characteristics of , The output of Structural parameters , Respectively used to realize Patch selection、Attention head selection and Block selection. The structural parameters are further passed Gumbel-softmax sampling , Generate Binary mask:
- Patch selection: except Class token outside , rest Token Will perform adaptive selection (Keep the most informative tokens):
- Head selection: For complex scenes or noisy backgrounds , It usually needs better subspace feature expression and multi Head Information fusion , To express the diversity of information ; But for simple samples , There is no need for complex diversity expression .Head selection There are two forms of implementation , One is to Mask by 0 Of Head Replace the output with all one tensor (Partial deactivation), The second is to directly eliminate the corresponding Attention Head(Full deactivation):
The actual results show that ,Full deactivation It can save more computation , But it will have a greater impact on the recognition accuracy .
- Block selection: It mainly includes MSA And FFN The choice of conditions for , In order to realize the Depth Structural compression of dimensions ( For simple samples , There is no need for deep-seated repeated information coding ):
- Objective function: The first is task related Loss, For example, classification tasks CE loss; The second is the smoothing term that constrains the structural parameters (Gamma The parameter represents the target calculation budget , Used to constrain the calculation cost ):
be The overall optimization goal Can be expressed as :
experimental result
The experiment compares different network structures with AdaViT Of Computational efficiency / Recognition accuracy , wait ; For details, refer to the experimental part of the paper .
of Transformer Model compression and optimization acceleration More discussion of , Refer to the following article :
边栏推荐
- (18) LCD1602 experiment
- 软考高级(信息系统项目管理师)高频考点:项目质量管理
- volatile关键字
- A Mexican airliner bound for the United States was struck by lightning after taking off and then returned safely
- Aardio - 通过变量名将变量值整合到一串文本中
- 12、 Start process
- CCNA Cisco network EIGRP protocol
- 如何用程序确认当前系统的存储模式?
- i. Mx6ull build boa server details and some of the problems encountered
- const关键字
猜你喜欢
硬件开发笔记(十): 硬件开发基本流程,制作一个USB转RS232的模块(九):创建CH340G/MAX232封装库sop-16并关联原理图元器件
Hardware development notes (10): basic process of hardware development, making a USB to RS232 module (9): create ch340g/max232 package library sop-16 and associate principle primitive devices
ResNet-RS:谷歌领衔调优ResNet,性能全面超越EfficientNet系列 | 2021 arxiv
[linear algebra] determinant of order 1.3 n
CCNA-思科网络 EIGRP协议
在IPv6中 链路本地地址的优势
2021 geometry deep learning master Michael Bronstein long article analysis
MySQL----初识MySQL
pytorch_ Yolox pruning [with code]
2500个常用中文字符 + 130常用中英文字符
随机推荐
Aardio - 利用customPlus库+plus构造一个多按钮组件
Report on technological progress and development prospects of solid oxide fuel cells in China (2022 Edition)
变量与“零值”的比较
做接口测试都测什么?有哪些通用测试点?
i.mx6ull搭建boa服务器详解及其中遇到的一些问题
Oracle-控制文件及日志文件的管理
Export MySQL table data in pure mode
在IPv6中 链路本地地址的优势
Seata aggregates at, TCC, Saga and XA transaction modes to create a one-stop distributed transaction solution
AI 企业多云存储架构实践 | 深势科技分享
Barcodex (ActiveX print control) v5.3.0.80 free version
Heavyweight news | softing fg-200 has obtained China 3C explosion-proof certification to provide safety assurance for customers' on-site testing
extern关键字
二分图判定
3DMAX assign face map
墨西哥一架飞往美国的客机起飞后遭雷击 随后安全返航
Research and investment strategy report of China's VOCs catalyst industry (2022 Edition)
2022-07-05 stonedb的子查询处理解析耗时分析
C # réalise la liaison des données du rapport Crystal et l'impression du Code à barres 4
Web APIs DOM time object