当前位置:网站首页>【Transformer】AdaViT: Adaptive Tokens for Efficient Vision Transformer
【Transformer】AdaViT: Adaptive Tokens for Efficient Vision Transformer
2022-07-29 06:03:00 【Dull cat】
List of articles

One 、 background
Transformer He has achieved brilliant performance in many tasks , In computer vision , Generally, the input image is divided into multiple patch, And then calculate patch Between self attention to achieve downstream tasks .
However, the amount of calculation of self attention mechanism is square with the size of the input image , therefore , Use on edge devices Transformer Has become a problem .
The author believes that different input images are important for the network , The difficulty of prediction is different . Like a car and a person in a clean background , Then it's easy to identify . If there are many different animals in a complex background , Then it is more difficult to identify .
Based on this , The author realizes a network structure , According to the difficulty of input , To dynamically adjust token To control the number of transformer The computational complexity of .

Two 、 Method

vision transformer The process is as follows :
- ϵ ( . ) \epsilon(.) ϵ(.): encoding network, Encode the input image into positioned token
- C ( . ) C(.) C(.):class token Post processing of
- L L L:transformer block
- F ( . ) F(.) F(.):self-attention
To kill dynamically tokens, The author wrote for each token Introduced a input-dependent halting score:
- H ( . ) H(.) H(.) yes halting module
- k k k yes token Indexes , l l l Is a layer

- t k , e l t_{k,e}^l tk,el yes t k l t_k^l tkl Of the e e e dimension
- σ \sigma σ yes logistic sigmoid function
- β \beta β and γ \gamma γ Is the translation and scaling factor used before nonlinear operation
For the sake of layer To track halting probabilities, Every token Accounting is a supplementary parameter :

halting probabilities as follows :
ponder loss : Every token Of ponder loss Will average .

The loss of classification task is :
halting score distribution Distribution is :
So use KL Divergence is used to measure the distribution deviation between real and predicted :
Then the total loss is :

3、 ... and 、 effect

From the picture 3 It can be seen that , adaptive choice token It can produce strong response to areas with high prominence and great changes , Usually related to category .
1、Token Color depth distribution :
Draw... In the diagram token The color of the , Pictured 4 Shown , In fact, it is an image centered 2D Gaussian like distribution , This also shows that ImageNet Most of the samples are in the middle . A lot of computation comes from the middle area , Few edges participate in the calculation .
2、Halting score distribution:
Pictured 5 Draw every... Of every image layer Of halting score.
Random sampling 5k Verification set , In the first few layer,halting score With layer Deepen and increase , Slowly decrease in the back .

3、 Difficult samples and simple samples
chart 6 It shows the difficult and simple examples and the amount of calculation required by them .
Simple examples can be classified correctly ,AdaViT It is also faster than difficult cases .

4、 Category sensitivity
Samples that were initially very sure or very unsure were adaptive The impact is very small ,adaptive Reasoning can promote the categories with obvious shapes , Such as independent furniture or animals .

边栏推荐
- How to make interesting apps for deep learning with zero code (suitable for novices)
- 【语义分割】Fully Attentional Network for Semantic Segmentation
- Anr Optimization: cause oom crash and corresponding solutions
- 并发编程学习笔记 之 Lock锁及其实现类ReentrantLock、ReentrantReadWriteLock和StampedLock的基本用法
- Flutter 绘制技巧探索:一起画箭头(技巧拓展)
- Valuable blog and personal experience collection (continuous update)
- 在uni-app项目中,如何实现微信小程序openid的获取
- Use of xtrabackup
- 主流实时流处理计算框架Flink初体验。
- 并发编程学习笔记 之 工具类Semaphore(信号量)
猜你喜欢

【TensorRT】将 PyTorch 转化为可部署的 TensorRT

"Shandong University mobile Internet development technology teaching website construction" project training log V

Super simple integration of HMS ml kit to realize parent control

Breaking through the hardware bottleneck (I): the development of Intel Architecture and bottleneck mining

ASM插桩:学完ASM Tree api,再也不用怕hook了

【Transformer】SOFT: Softmax-free Transformer with Linear Complexity

PHP write a diaper to buy the lowest price in the whole network

Thinkphp6 output QR code image format to solve the conflict with debug

Reporting Service 2016 自定义身份验证

【网络设计】ConvNeXt:A ConvNet for the 2020s
随机推荐
Synchronous development with open source projects & codereview & pull request & Fork how to pull the original warehouse
与张小姐的春夏秋冬(4)
Training log 7 of the project "construction of Shandong University mobile Internet development technology teaching website"
Technology that deeply understands the principle of MMAP and makes big manufacturers love it
Research and implementation of flash loan DAPP
Analysis on the principle of flow
在uni-app项目中,如何实现微信小程序openid的获取
这些你一定要知道的进程知识
day02 作业之文件权限
These process knowledge you must know
ANR优化:导致 OOM 崩溃及相对应的解决方案
Detailed explanation of tool classes countdownlatch and cyclicbarrier of concurrent programming learning notes
How to make interesting apps for deep learning with zero code (suitable for novices)
性能优化之趣谈线程池:线程开的越多就越好吗?
[DL] build convolutional neural network for regression prediction (detailed tutorial of data + code)
并发编程学习笔记 之 工具类CountDownLatch、CyclicBarrier详解
中海油集团,桌面云&网盘存储系统应用案例
Huawei 2020 school recruitment written test programming questions read this article is enough (Part 1)
【图像分类】如何使用 mmclassification 训练自己的分类模型
Reporting Service 2016 自定义身份验证