当前位置:网站首页>【Transformer】AdaViT: Adaptive Tokens for Efficient Vision Transformer
【Transformer】AdaViT: Adaptive Tokens for Efficient Vision Transformer
2022-07-29 06:03:00 【Dull cat】
List of articles

One 、 background
Transformer He has achieved brilliant performance in many tasks , In computer vision , Generally, the input image is divided into multiple patch, And then calculate patch Between self attention to achieve downstream tasks .
However, the amount of calculation of self attention mechanism is square with the size of the input image , therefore , Use on edge devices Transformer Has become a problem .
The author believes that different input images are important for the network , The difficulty of prediction is different . Like a car and a person in a clean background , Then it's easy to identify . If there are many different animals in a complex background , Then it is more difficult to identify .
Based on this , The author realizes a network structure , According to the difficulty of input , To dynamically adjust token To control the number of transformer The computational complexity of .
Two 、 Method
vision transformer The process is as follows :
- ϵ ( . ) \epsilon(.) ϵ(.): encoding network, Encode the input image into positioned token
- C ( . ) C(.) C(.):class token Post processing of
- L L L:transformer block
- F ( . ) F(.) F(.):self-attention
To kill dynamically tokens, The author wrote for each token Introduced a input-dependent halting score:
- H ( . ) H(.) H(.) yes halting module
- k k k yes token Indexes , l l l Is a layer
- t k , e l t_{k,e}^l tk,el yes t k l t_k^l tkl Of the e e e dimension
- σ \sigma σ yes logistic sigmoid function
- β \beta β and γ \gamma γ Is the translation and scaling factor used before nonlinear operation
For the sake of layer To track halting probabilities, Every token Accounting is a supplementary parameter :
halting probabilities as follows :
ponder loss : Every token Of ponder loss Will average .
The loss of classification task is :
halting score distribution Distribution is :
So use KL Divergence is used to measure the distribution deviation between real and predicted :
Then the total loss is :
3、 ... and 、 effect
From the picture 3 It can be seen that , adaptive choice token It can produce strong response to areas with high prominence and great changes , Usually related to category .
1、Token Color depth distribution :
Draw... In the diagram token The color of the , Pictured 4 Shown , In fact, it is an image centered 2D Gaussian like distribution , This also shows that ImageNet Most of the samples are in the middle . A lot of computation comes from the middle area , Few edges participate in the calculation .
2、Halting score distribution:
Pictured 5 Draw every... Of every image layer Of halting score.
Random sampling 5k Verification set , In the first few layer,halting score With layer Deepen and increase , Slowly decrease in the back .
3、 Difficult samples and simple samples
chart 6 It shows the difficult and simple examples and the amount of calculation required by them .
Simple examples can be classified correctly ,AdaViT It is also faster than difficult cases .
4、 Category sensitivity
Samples that were initially very sure or very unsure were adaptive The impact is very small ,adaptive Reasoning can promote the categories with obvious shapes , Such as independent furniture or animals .
边栏推荐
- Huawei 2020 school recruitment written test programming questions read this article is enough (Part 1)
- 【目标检测】6、SSD
- [database] database course design - vaccination database
- xtrabackup 的使用
- 这些你一定要知道的进程知识
- Interesting talk about performance optimization thread pool: is the more threads open, the better?
- 与张小姐的春夏秋冬(1)
- SQL repair duplicate data
- asyncawait和promise的区别
- Flink connector Oracle CDC 实时同步数据到MySQL(Oracle12c)
猜你喜欢
Ribbon学习笔记一
File文件上传的使用(2)--上传到阿里云Oss文件服务器
【Clustrmaps】访客统计
Simple optimization of interesting apps for deep learning (suitable for novices)
手撕ORM 框架(泛型+注解+反射)
nacos外置数据库的配置与使用
Ribbon learning notes 1
【Transformer】AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
【ML】机器学习模型之PMML--概述
Is flutter being quietly abandoned? On the future of flutter
随机推荐
【综述】图像分类网络
手撕ORM 框架(泛型+注解+反射)
Study and research the way of programming
Ribbon learning notes II
[overview] image classification network
How to PR an open source composer project
Analysis on the principle of flow
钉钉告警脚本
The difference between asyncawait and promise
[clustmaps] visitor statistics
【目标检测】6、SSD
Flink, the mainstream real-time stream processing computing framework, is the first experience.
Ribbon学习笔记二
Win10+opencv3.2+vs2015 configuration
Ffmpeg creation GIF expression pack tutorial is coming! Say thank you, brother black fly?
Performance comparison | FASS iSCSI vs nvme/tcp
Research and implementation of flash loan DAPP
ASM piling: after learning ASM tree API, you don't have to be afraid of hook anymore
【Clustrmaps】访客统计
并发编程学习笔记 之 Lock锁及其实现类ReentrantLock、ReentrantReadWriteLock和StampedLock的基本用法