当前位置:网站首页>【Transformer】ATS: Adaptive Token Sampling For Efficient Vision Transformers
【Transformer】ATS: Adaptive Token Sampling For Efficient Vision Transformers
2022-07-29 08:19:00 【Dull cat】
List of articles

One 、 background
Despite the existing transformer The model has achieved good results in classification and other tasks , But the amount of calculation is still very high , It takes a lot of GFLOPs, Not applicable to many edge devices , although GFLOPs You can also reduce the network token Quantity to reduce ,DynamicViT Use the network to predict each token Score of , To determine which token It's redundant . Although this method can reduce the network GFLOPs, But the score prediction network will also introduce additional parameters , And if you want a different parameter reduction ratio, you need to train again .
Two 、 motivation
The author thinks that , For classification tasks , Not all the information in the diagram is needed to classify , Because the information of the image is redundant for the classification task . So this paper proposes a reduction token The method of quantity , It can be applied to any transformer, Not subject to the reduction ratio , And more efficient .
3、 ... and 、 Method
The author has put forward a new type called " Adaptive Token Sampler (ATS) " Module , It is a dynamic slave input token Choose the important token Module . Also a parameter-free Methods , The overall structure is shown in the figure 2 Shown . Convolution network , You usually use pooling To reduce the amount of computation ,stage Deeper , The smaller the resolution . but Transformer Such a method cannot be directly used in , because token It has nothing to do with spatial location , That is, changing the position will not affect the final result . And if you use down sampling, there will be two disadvantages , One is that the details of the target will be lost , Second, it may retain a lot of background information , It has no substantive effect on classification . Therefore, the author proposes a dynamic selection of each stage Of token The method of quantity .
ATS The process of :
- First , Yes N Inputs token Assign a score , Determine which ones are left based on the score
- then , Set up K For reserved token The largest number , This K Will decide GFLOPs Upper limit
- sampled tokens K’ General comparison K Small , And the relationship with the input image is shown in the figure 6 Shown

For each instance , chart 7 It shows that the author uses a few or most patches , You can get the correct classification , chart 3 Shows different each stage The use of token Number . The author also proposes a correct choice for each image token The method of quantity . Pictured 6 Shown , Different images are different stage Of token The number is different .


3.1 Token Scoring
In standard self-attention layer , Input Q、K、V All from input token from , And then you get attention matrix A:
because softmax The existence of ,A The sum of each line of is 1, Output token Hui He attention matrix effect , Thus weighted .
A Each line of contains the input token Of attention weights, This weights In fact, it means all token For output token The role of , because A The first line is cls token, Indicates the input token For output classification token The role of , So the author uses the elements in the first line as pruning A Basis of , Pictured 2 Shown . The author also made normalization , The importance scores are as follows , For long attention , Calculate each head separately , And then add up :
3.2 Token Sampling
For each token obtain score after , It can be based on attention matrix A Yes tokens Pruned .
A more basic approach is to choose directly top-K individual tokens, But the experimental results show , This method has no dynamic selection K’ individual tokens The effect is good . The reason for its poor performance is , Directly discard all low scores token, But some of the token In fact, it may be more useful in shallow layer .
In the author's sampling method , From several similar token The abstract probability in is equal to these token The sum of the scores . And from the figure 3 You can also see , The sampling mechanism of this paper is from shallow sampling token The quantity is a little more than the deep .
Method :
because token score Is normalized , So we can see the probability , The cumulative density function can be calculated (CDF):
Yes CDF Take the opposite , You get the sampling function :
Four 、 effect

边栏推荐
- Some tools, plug-ins and software links are shared with you~
- 110 MySQL interview questions and answers (continuously updated)
- Importerror: no module named XX
- 随机抽奖转盘微信小程序项目源码
- 为了速率创建线程池,启动核心线程
- Time function in MySQL
- V-Ray 5 acescg workflow settings
- TCP——滑动窗口
- Detailed steps of installing MySQL 5.7 for windows
- Nrf52832-qfaa Bluetooth wireless chip
猜你喜欢

Tle5012b+stm32f103c8t6 (bluepill) reading angle data

集群使用规范

Reading papers on false news detection (4): a novel self-learning semi supervised deep learning network to detect fake news on

分段分页以及段页结合
![[beauty of software engineering - column notes] 26 | continuous delivery: how to release new versions to the production environment at any time?](/img/65/79f876b62fa3db421e5038a2445b83.png)
[beauty of software engineering - column notes] 26 | continuous delivery: how to release new versions to the production environment at any time?
![[beauty of software engineering - column notes] 30 | make good use of source code management tools to make your collaboration more efficient](/img/d1/5b980d8b9580b9808b2b3f51d5b9c6.png)
[beauty of software engineering - column notes] 30 | make good use of source code management tools to make your collaboration more efficient

Huawei wireless device configuration uses WDS technology to deploy WLAN services

Inclination sensor is used for long-term monitoring of communication tower and high-voltage tower

【学术相关】为什么很多国内学者的AI的论文复现不了?

The computer video pauses and resumes, and the sound suddenly becomes louder
随机推荐
ML.NET相关资源整理
Intelligent shelf safety monitoring system
Cs5340 domestic alternative dp5340 multi bit audio a/d converter
Ga-rpn: recommended area network for guiding anchors
node:文件写入数据(readFile、writeFile),覆盖与增量两种模式
Network Security Learning chapter
The computer video pauses and resumes, and the sound suddenly becomes louder
pnpm install出现:ERR_PNPM_PEER_DEP_ISSUES Unmet peer dependencies
Cv520 domestic replacement of ci521 13.56MHz contactless reader chip
深度学习(2):图片文字识别
125kHz wake-up function 2.4GHz single transmitter chip-si24r2h
BiSeNet v2
Application of explosion-proof inclination sensor in safe operation of LNG
TCP - sliding window
Random lottery turntable wechat applet project source code
Deep learning (1): prediction of bank customer loss
Windows 安装 MySQL 5.7详细步骤
ROS tutorial (Xavier)
Use the cloud code to crack the problem of authentication code encountered during login
[robomaster] a board receives jy-me01 angle sensor data -- Modbus Protocol & CRC software verification