当前位置:网站首页>UC San Diego | evit: using token recombination to accelerate visual transformer (iclr2022)
UC San Diego | evit: using token recombination to accelerate visual transformer (iclr2022)
2022-06-22 04:37:00 【Zhiyuan community】
Paper title :Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations
Thesis link :https://openreview.net/forum?id=BjyvwnXXVn_
Code link :https://github.com/youweiliang/evit
Author's unit :UC San Diego & The university of Hong Kong & tencent AI lab
Vision Transformers (ViTs) Treat all image blocks as token, And build multi head self attention (MHSA). Make full use of these images token It will lead to redundant computation , Because not all token All in MHSA The middle is attentive . Examples include tag pairs that contain semantically meaningless or distracting image backgrounds ViT Predictions do not contribute positively . In this work , We suggest that ViT The feedforward process of the model reorganizes the image token, Integrate it into... During training ViT in . For each forward reasoning , We recognize MHSA and FFN( Feedforward network ) Attention images between modules token, This is by the corresponding class token Attention directed . then , We do this by keeping an image of interest token And the image is reconstructed by fusing the image markers that are not concerned token, To speed up the follow-up MHSA and FFN Calculation . So , Our approach EViT Improved from two perspectives ViT. First , Under the same number of input image markers , Our approach has been reduced MHSA and FFN Calculate to achieve efficient reasoning . for example ,DeiT-S The speed of reasoning has increased 50%, and ImageNet The recognition accuracy of classification has only decreased 0.3%. secondly , By keeping the same calculated cost , Our method makes ViT More image markers can be used as input to improve recognition accuracy , Where the image marker comes from a higher resolution image . One example is , In contrast to ordinary DeiT-S Under the same calculated cost , We will DeiT-S Of ImageNet The accuracy of classification and recognition is improved 1%. meanwhile , Our approach is not directed to ViT Introduce more parameters . Experiments on a standard benchmark demonstrate the effectiveness of our method .

边栏推荐
猜你喜欢

哈夫曼树

With these websites, do you still worry about job hopping without raising your salary?

PCM data format

解决Swagger2显示UI界面但是没内容

Online document collaboration: a necessary efficient artifact for office

Huffman tree

Golang为什么不推荐使用this/self/me/that/_this

邻接矩阵,邻接表,十字链表,邻接多重表

After the active RM machine is powered off, RM ha switching is normal. However, the cluster resources cannot be viewed on the yarnui, and the application is always in the accepted state.

KS004 基于SSH通讯录系统设计与实现
随机推荐
Es cannot work, circuitbreakingexception
Idea blue screen solution
"O & M youxiaodeng" active directory batch modification user
Kotlin项目报错缺少CoroutineContext依赖
Introduction to AWS elastic Beanstalk
Lightweight CNN design skills
【sdx62】QCMAP_CLI手动拨号操作说明
On the income of enterprise executives
二叉树线索化
图的基本概念
Internet of things UWB technology scheme, intelligent UWB precise positioning, centimeter level positioning accuracy
Lua exports as an external link library and uses
cadence allegro 17. X conversion tool for downgrading to 16.6
When the move protocol beta is in progress, the ecological core equity Momo is divided
Golang為什麼不推薦使用this/self/me/that/_this
Researcher of Shangtang intelligent medical team interprets organ image processing under intelligent medical treatment
fc2新域名有什么价值?如何解析到网站?
轻量级CNN设计技巧
Overrides vs overloads of methods
Tianyang technology - Bank of Ningbo interview question [Hangzhou multi tester] [Hangzhou multi tester \wang Sir]