当前位置:网站首页>【Transformer】AdaViT: Adaptive Tokens for Efficient Vision Transformer
【Transformer】AdaViT: Adaptive Tokens for Efficient Vision Transformer
2022-07-29 06:03:00 【Dull cat】
List of articles

One 、 background
Transformer He has achieved brilliant performance in many tasks , In computer vision , Generally, the input image is divided into multiple patch, And then calculate patch Between self attention to achieve downstream tasks .
However, the amount of calculation of self attention mechanism is square with the size of the input image , therefore , Use on edge devices Transformer Has become a problem .
The author believes that different input images are important for the network , The difficulty of prediction is different . Like a car and a person in a clean background , Then it's easy to identify . If there are many different animals in a complex background , Then it is more difficult to identify .
Based on this , The author realizes a network structure , According to the difficulty of input , To dynamically adjust token To control the number of transformer The computational complexity of .

Two 、 Method

vision transformer The process is as follows :
- ϵ ( . ) \epsilon(.) ϵ(.): encoding network, Encode the input image into positioned token
- C ( . ) C(.) C(.):class token Post processing of
- L L L:transformer block
- F ( . ) F(.) F(.):self-attention
To kill dynamically tokens, The author wrote for each token Introduced a input-dependent halting score:
- H ( . ) H(.) H(.) yes halting module
- k k k yes token Indexes , l l l Is a layer

- t k , e l t_{k,e}^l tk,el yes t k l t_k^l tkl Of the e e e dimension
- σ \sigma σ yes logistic sigmoid function
- β \beta β and γ \gamma γ Is the translation and scaling factor used before nonlinear operation
For the sake of layer To track halting probabilities, Every token Accounting is a supplementary parameter :

halting probabilities as follows :
ponder loss : Every token Of ponder loss Will average .

The loss of classification task is :
halting score distribution Distribution is :
So use KL Divergence is used to measure the distribution deviation between real and predicted :
Then the total loss is :

3、 ... and 、 effect

From the picture 3 It can be seen that , adaptive choice token It can produce strong response to areas with high prominence and great changes , Usually related to category .
1、Token Color depth distribution :
Draw... In the diagram token The color of the , Pictured 4 Shown , In fact, it is an image centered 2D Gaussian like distribution , This also shows that ImageNet Most of the samples are in the middle . A lot of computation comes from the middle area , Few edges participate in the calculation .
2、Halting score distribution:
Pictured 5 Draw every... Of every image layer Of halting score.
Random sampling 5k Verification set , In the first few layer,halting score With layer Deepen and increase , Slowly decrease in the back .

3、 Difficult samples and simple samples
chart 6 It shows the difficult and simple examples and the amount of calculation required by them .
Simple examples can be classified correctly ,AdaViT It is also faster than difficult cases .

4、 Category sensitivity
Samples that were initially very sure or very unsure were adaptive The impact is very small ,adaptive Reasoning can promote the categories with obvious shapes , Such as independent furniture or animals .

边栏推荐
- 【Clustrmaps】访客统计
- Operation commands in anaconda, such as removing old environment, adding new environment, viewing environment, installing library, cleaning cache, etc
- Anr Optimization: cause oom crash and corresponding solutions
- Activity交互问题,你确定都知道?
- Technology that deeply understands the principle of MMAP and makes big manufacturers love it
- Flutter 绘制技巧探索:一起画箭头(技巧拓展)
- 微信小程序源码获取(附工具的下载)
- These process knowledge you must know
- Markdown语法
- 性能优化之趣谈线程池:线程开的越多就越好吗?
猜你喜欢

FFmpeg创作GIF表情包教程来了!赶紧说声多谢乌蝇哥?

在uni-app项目中,如何实现微信小程序openid的获取

微信小程序源码获取(附工具的下载)

【目标检测】Generalized Focal Loss V1

Windos下安装pyspider报错:Please specify --curl-dir=/path/to/built/libcurl解决办法

主流实时流处理计算框架Flink初体验。

并发编程学习笔记 之 工具类Semaphore(信号量)

Tear the ORM framework by hand (generic + annotation + reflection)

"Shandong University mobile Internet development technology teaching website construction" project training log I

【语义分割】SETR_Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformer
随机推荐
Research on the implementation principle of reentrantlock in concurrent programming learning notes
A preliminary study on fastjason's autotype
[competition website] collect machine learning / deep learning competition website (continuously updated)
yum本地源制作
【DL】关于tensor(张量)的介绍和理解
"Shandong University mobile Internet development technology teaching website construction" project training log I
与张小姐的春夏秋冬(3)
Super simple integration HMS ml kit face detection to achieve cute stickers
【目标检测】6、SSD
并发编程学习笔记 之 Lock锁及其实现类ReentrantLock、ReentrantReadWriteLock和StampedLock的基本用法
并发编程学习笔记 之 原子操作类AtomicReference、AtomicStampedReference详解
Reporting Service 2016 自定义身份验证
【Transformer】AdaViT: Adaptive Tokens for Efficient Vision Transformer
【比赛网站】收集机器学习/深度学习比赛网站(持续更新)
Research and implementation of flash loan DAPP
[CV] what are the specific numbers of convolution kernels (filters) 3*3, 5*5, 7*7 and 11*11?
Detailed explanation of tool classes countdownlatch and cyclicbarrier of concurrent programming learning notes
Training log II of the project "construction of Shandong University mobile Internet development technology teaching website"
Isaccessible() method: use reflection techniques to improve your performance several times
Ffmpeg creation GIF expression pack tutorial is coming! Say thank you, brother black fly?