当前位置:网站首页>Adavit -- dynamic network with adaptive selection of computing structure
Adavit -- dynamic network with adaptive selection of computing structure
2022-07-06 22:29:00 【Law-Yao】
Paper Address :https://arxiv.org/abs/2111.15668
GitHub link :GitHub - MengLcool/AdaViT: Official implementation of AdaViT
Methods
ViT Based on the characteristics or advantages of its own structure , Have a good Abstract semantic expression or feature representation ability :
- adopt Attention Calculation , Realize the coding of global correlation information ;
- adopt Multi-head Attention, Further realize the feature abstraction and fusion of different sub representation spaces ;
- Through deep level Transformer layer Stack of , Further realize feature abstraction ;
However , For samples with different degrees of difficulty ,ViT Actual calculation required Patch Number 、Attention head The number or network layers can be different , Therefore, it can be constructed Conditional calculation of sample driven form (Sample-driven conditional computation).
AdaViT By designing dynamic network structure , According to the difficulty of inputting samples 、 Adaptively select the best computing structure , Include Patch selection、Attention head selection as well as Block selection, The specific method is described as follows :
- Decision network: Every Transformer layer There will be a Decision networks ( It consists of three linear layers ), The input of the decision network is current Transformer layer The input characteristics of , The output of Structural parameters , Respectively used to realize Patch selection、Attention head selection and Block selection. The structural parameters are further passed Gumbel-softmax sampling , Generate Binary mask:
- Patch selection: except Class token outside , rest Token Will perform adaptive selection (Keep the most informative tokens):
- Head selection: For complex scenes or noisy backgrounds , It usually needs better subspace feature expression and multi Head Information fusion , To express the diversity of information ; But for simple samples , There is no need for complex diversity expression .Head selection There are two forms of implementation , One is to Mask by 0 Of Head Replace the output with all one tensor (Partial deactivation), The second is to directly eliminate the corresponding Attention Head(Full deactivation):
The actual results show that ,Full deactivation It can save more computation , But it will have a greater impact on the recognition accuracy .
- Block selection: It mainly includes MSA And FFN The choice of conditions for , In order to realize the Depth Structural compression of dimensions ( For simple samples , There is no need for deep-seated repeated information coding ):
- Objective function: The first is task related Loss, For example, classification tasks CE loss; The second is the smoothing term that constrains the structural parameters (Gamma The parameter represents the target calculation budget , Used to constrain the calculation cost ):
be The overall optimization goal Can be expressed as :
experimental result
The experiment compares different network structures with AdaViT Of Computational efficiency / Recognition accuracy , wait ; For details, refer to the experimental part of the paper .
of Transformer Model compression and optimization acceleration More discussion of , Refer to the following article :
边栏推荐
- PVL EDI 项目案例
- Clip +json parsing converts the sound in the video into text
- 软考高级(信息系统项目管理师)高频考点:项目质量管理
- Management background --3, modify classification
- 2022-07-05 使用tpcc对stonedb进行子查询测试
- extern关键字
- Common sense: what is "preservation" in insurance?
- Gd32f4xx serial port receive interrupt and idle interrupt configuration
- 小程序系统更新提示,并强制小程序重启并使用新版本
- 2022-07-04 mysql的高性能数据库引擎stonedb在centos7.9编译及运行
猜你喜欢
Chapter 3: detailed explanation of class loading process (class life cycle)
Management background --2 Classification list
Mise en place d'un environnement de développement OP - tee basé sur qemuv8
手写ABA遇到的坑
Learn the principle of database kernel from Oracle log parsing
RESNET rs: Google takes the lead in tuning RESNET, and its performance comprehensively surpasses efficientnet series | 2021 arXiv
墨西哥一架飞往美国的客机起飞后遭雷击 随后安全返航
Should novice programmers memorize code?
Classic sql50 questions
Aardio - 封装库时批量处理属性与回调函数的方法
随机推荐
十二、启动流程
AI 企业多云存储架构实践 | 深势科技分享
config:invalid signature 解决办法和问题排查详解
[leetcode daily clock in] 1020 Number of enclaves
A Mexican airliner bound for the United States was struck by lightning after taking off and then returned safely
第4章:再谈类的加载器
Aardio - 封装库时批量处理属性与回调函数的方法
That's why you can't understand recursion
qt quick项目offscreen模式下崩溃的问题处理
Research and investment strategy report of China's VOCs catalyst industry (2022 Edition)
自制J-Flash烧录工具——Qt调用jlinkARM.dll方式
【sdx62】WCN685X将bdwlan.bin和bdwlan.txt相互转化操作方法
2022-07-05 stonedb的子查询处理解析耗时分析
Oracle control file and log file management
go多样化定时任务通用实现与封装
Assembly and Interface Technology Experiment 6 - ADDA conversion experiment, AD acquisition system in interrupt mode
The nearest common ancestor of binary (search) tree ●●
Anaconda installs third-party packages
AdaViT——自适应选择计算结构的动态网络
网络基础入门理解