当前位置:网站首页>【Transformer】ATS: Adaptive Token Sampling For Efficient Vision Transformers
【Transformer】ATS: Adaptive Token Sampling For Efficient Vision Transformers
2022-07-29 08:19:00 【Dull cat】
List of articles

One 、 background
Despite the existing transformer The model has achieved good results in classification and other tasks , But the amount of calculation is still very high , It takes a lot of GFLOPs, Not applicable to many edge devices , although GFLOPs You can also reduce the network token Quantity to reduce ,DynamicViT Use the network to predict each token Score of , To determine which token It's redundant . Although this method can reduce the network GFLOPs, But the score prediction network will also introduce additional parameters , And if you want a different parameter reduction ratio, you need to train again .
Two 、 motivation
The author thinks that , For classification tasks , Not all the information in the diagram is needed to classify , Because the information of the image is redundant for the classification task . So this paper proposes a reduction token The method of quantity , It can be applied to any transformer, Not subject to the reduction ratio , And more efficient .
3、 ... and 、 Method
The author has put forward a new type called " Adaptive Token Sampler (ATS) " Module , It is a dynamic slave input token Choose the important token Module . Also a parameter-free Methods , The overall structure is shown in the figure 2 Shown . Convolution network , You usually use pooling To reduce the amount of computation ,stage Deeper , The smaller the resolution . but Transformer Such a method cannot be directly used in , because token It has nothing to do with spatial location , That is, changing the position will not affect the final result . And if you use down sampling, there will be two disadvantages , One is that the details of the target will be lost , Second, it may retain a lot of background information , It has no substantive effect on classification . Therefore, the author proposes a dynamic selection of each stage Of token The method of quantity .
ATS The process of :
- First , Yes N Inputs token Assign a score , Determine which ones are left based on the score
- then , Set up K For reserved token The largest number , This K Will decide GFLOPs Upper limit
- sampled tokens K’ General comparison K Small , And the relationship with the input image is shown in the figure 6 Shown

For each instance , chart 7 It shows that the author uses a few or most patches , You can get the correct classification , chart 3 Shows different each stage The use of token Number . The author also proposes a correct choice for each image token The method of quantity . Pictured 6 Shown , Different images are different stage Of token The number is different .


3.1 Token Scoring
In standard self-attention layer , Input Q、K、V All from input token from , And then you get attention matrix A:
because softmax The existence of ,A The sum of each line of is 1, Output token Hui He attention matrix effect , Thus weighted .
A Each line of contains the input token Of attention weights, This weights In fact, it means all token For output token The role of , because A The first line is cls token, Indicates the input token For output classification token The role of , So the author uses the elements in the first line as pruning A Basis of , Pictured 2 Shown . The author also made normalization , The importance scores are as follows , For long attention , Calculate each head separately , And then add up :
3.2 Token Sampling
For each token obtain score after , It can be based on attention matrix A Yes tokens Pruned .
A more basic approach is to choose directly top-K individual tokens, But the experimental results show , This method has no dynamic selection K’ individual tokens The effect is good . The reason for its poor performance is , Directly discard all low scores token, But some of the token In fact, it may be more useful in shallow layer .
In the author's sampling method , From several similar token The abstract probability in is equal to these token The sum of the scores . And from the figure 3 You can also see , The sampling mechanism of this paper is from shallow sampling token The quantity is a little more than the deep .
Method :
because token score Is normalized , So we can see the probability , The cumulative density function can be calculated (CDF):
Yes CDF Take the opposite , You get the sampling function :
Four 、 effect

边栏推荐
- Reading of false news detection papers (3): semi supervised content-based detection of misinformation via tensor embeddings
- 网络安全之安全基线
- Brief introduction and use of commonjs import and export and ES6 modules import and export
- BiSeNet v2
- ROS common instructions
- Dp4301-sub-1g highly integrated wireless transceiver chip
- Solve the problem of MSVC2017 compiler with yellow exclamation mark in kits component of QT
- UE4 highlight official reference value
- [beauty of software engineering - column notes] 24 | technical debt: continue to make do with it, or overthrow it and start over?
- The difference between torch.tensor and torch.tensor
猜你喜欢

数仓分层设计及数据同步问题,,220728,,,,

Arduino uno error analysis avrdude: stk500_ recv(): programmer is not responding

深度学习(1):银行客户流失预测

What constitutes the smart charging pile system?

Inclination sensor accuracy calibration test
![[robomaster] a board receives jy-me01 angle sensor data -- Modbus Protocol & CRC software verification](/img/0e/e5be0fffb154d081c20b09832530d4.png)
[robomaster] a board receives jy-me01 angle sensor data -- Modbus Protocol & CRC software verification

华为无线设备配置利用WDS技术部署WLAN业务

Simple calculator wechat applet project source code

Inclination monitoring solution of Internet of things

PostgreSQL manually creates hikaridatasource to solve the error cannot commit when autocommit is enabled
随机推荐
Simplefoc parameter adjustment 2- speed and position control
亚马逊测评自养号是什么,卖家应该怎么做?
Ga-rpn: recommended area network for guiding anchors
Four pin OLED display based on stm32
Application of explosion-proof inclination sensor in safe operation of LNG
Simplefoc+platformio stepping on the path of the pit
Low power Bluetooth 5.0 chip nrf52832-qfaa
Simplefoc parameter adjustment 1-torque control
V-Ray 5 acescg workflow settings
SQL 面试碰到的一个问题
Process and concept of process
Detailed steps of installing MySQL 5.7 for windows
Arduino uno error analysis avrdude: stk500_ recv(): programmer is not responding
阿里巴巴政委体系-第四章、政委建在连队上
ML.NET相关资源整理
Simulation of four way responder based on 51 single chip microcomputer
阿里巴巴政委体系-第一章、政委建在连队上
Network Security Learning chapter
(视频+图文)机器学习入门系列-第5章 机器学习实践
Dynamically load data