当前位置:网站首页>AAAI2022-ShiftVIT: When Shift Operation Meets Vision Transformer
AAAI2022-ShiftVIT: When Shift Operation Meets Vision Transformer
2022-06-11 04:54:00 【Shenlan Shenyan AI】

The paper :【AAAI2022】When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism
Code :https://link.zhihu.com/?target=https%3A//github.com/microsoft/SPACH
B Station author explains video :https://www.bilibili.com/video/BV1a3411h7su
Research motivation
This job is to use a very simple operation instead of attention, It has achieved very good results . First of all, I will introduce the motivation. The author thinks that Tranformer The key to success lies in two features :
Global: Fast global modeling capability , Every token Can be compared with other token There's a connection
Dynamic: Dynamically learn a set of weights for each sample
The author's motivation Namely : Can we replace... In a simpler way attention , More extreme is NO global, NO dynamics, and even NO parameter and NO arithmetic calculation .
So , The author puts forward shift block, It's simple , The essence is to perform a simple shift operation on some features to replace self-attention .
Methods to introduce
As shown in the figure below , The standard Transformer block First use attention Handle , Reuse FFN Handle . The author proposes to use shift block Instead of attention. This module is very simple , The input dimension is CHW Characteristics of , Along the C Take out a part in this direction , Then the average score is 4 Share , this 4 The features of the parts are respectively along Left 、 Right 、 On 、 Next For mobile , The characteristics of the rest remain unchanged .

In the author's implementation ,shift The step size of is set to 1 Pixel , meanwhile , choice 1/3 Channel of shift (1/12 The channel of moves to the left 1 Pixel ,1/12 The channel of moves to the right 1 Pixel ,1/12 Channel up for 1 Pixel ,1/12 The channel of moves down 1 Pixel ). The pytroch The code is as follows , You can see it , The calculation of this module is very simple , There are basically no parameters .

On the network architecture , The target of this method is swin transformer, except attention For modules shift block replace , The other parts are exactly the same .

The author first got ShiftVIT/light, The number of parameters is significantly reduced . In order to maintain and swin transformer Almost the same , The author in stage3 and stage4 Respectively added 6 And 1 A module , Reach and swin transformer A model with basically the same parameters Shift-T, As shown in the following table .

experimental result
The following table only lists ImageNet Experimental results on image classification , It can be seen that , Direct replacement performance degrades , But adding modules Shift-T Model performance has improved , however S Models and B The performance of the model will decrease slightly . The author also did target detection 、 Experiments on semantic segmentation , Come to the conclusion that , The performance and swin It's almost the same , But when the model is small ,ShiftVIT There will be more advantages .

The authors of ablation experiments have also analyzed many , This is just an introduction shift block An experiment with only one parameter , That's it shifted channel The proportion of , You can see , When the proportion is too small , The performance will be inferior to swin-T. When set to 1/3 when , Performance is the best .

The author also conducted an interesting experiment called training scheme, Analysis of the Transformer There may also be some reasons for the performance breakthrough trick . Use Adam Replace SGD, use GELU Replace ReLU, use LN Replace BN, And add epoch The number of , Will improve performance . This also shows that , These factors may also be VIT The key to success .

summary
The author summarizes two inspirations :1)self-attention Maybe not VIT The key to success , Use simple channels shift The operation can also surpass the small model swin transformer.2)VIT Training strategies (Adam、GELU、LN etc. ) It's the key to performance improvement .
author : peak OUC
| About Deep extension technology |

Shenyan technology was founded in 2018 year 1 month , Zhongguancun High tech enterprise , It is an enterprise with the world's leading artificial intelligence technology AI Service experts . In computer vision 、 Based on the core technology of natural language processing and data mining , The company launched four platform products —— Deep extension intelligent data annotation platform 、 Deep extension AI Development platform 、 Deep extension automatic machine learning platform 、 Deep extension AI Open platform , Provide data processing for enterprises 、 Model building and training 、 Privacy computing 、 One stop shop for Industry algorithms and solutions AI Platform services .
边栏推荐
- Network security construction in 5g Era
- Cartographer learning record: cartographer Map 3D visualization configuration (self recording dataset version)
- 华为设备配置本地虚拟专用网互访
- C language test question 3 (grammar multiple choice question - including detailed explanation of knowledge points)
- Learning summary 01- machine learning
- 华为设备配置通过GRE隧道接入虚拟专用网
- Chia Tai International: anyone who really invests in qihuo should know
- Leetcode question brushing series - mode 2 (datastructure linked list) - 19:remove nth node from end of list (medium) delete the penultimate node in the linked list
- Sealem Finance打造Web3去中心化金融平台基础设施
- Simple linear regression of sklearn series
猜你喜欢

Lr-link Lianrui fully understands the server network card

Cartographer learning record: cartographer Map 3D visualization configuration (self recording dataset version)

codesys 獲取系統時間

Best practices and principles of lean product development system

Lianrui electronics made an appointment with you with SIFA to see two network cards in the industry's leading industrial automation field first

Electrolytic solution for ThinkPad X1 carbon battery

QT method for generating QR code pictures

Paper reproduction: expressive body capture

New library goes online | cnopendata immovable cultural relic data
![[Transformer]Is it Time to Replace CNNs with Transformers for Medical Images?](/img/83/7025050667c382857c032bdd8f6649.jpg)
[Transformer]Is it Time to Replace CNNs with Transformers for Medical Images?
随机推荐
Powerful new UI installation force artifact wechat applet source code + multiple templates support multiple traffic main modes
选择数字资产托管人时,要问的 6 个问题
The data center is evolving towards the "four high" trend, and the OCP network card and the whole cabinet will be delivered into the mainstream of the future data center
Record of serial baud rate
华为设备配置通过GRE接入虚拟专用网
精益产品开发体系最佳实践及原则
The solution "no hardware is configured for this address and cannot be modified" appears during botu simulation
Emlog new navigation source code / with user center
Feature engineering feature dimension reduction
New product pre-sale: 25g optical network card based on Intel 800 series is coming
Using keras to build the basic model yingtailing flower
Analysis of 17 questions in Volume 1 of the new college entrance examination in 2022
C language test question 3 (program multiple choice question - including detailed explanation of knowledge points)
The first master account of Chia Tai International till autumn
CONDA switching
C语言试题三(程序选择题——含知识点详解)
力扣(LeetCode)161. 相隔为 1 的编辑距离(2022.06.10)
Unzip Imagenet after downloading
Sealem Finance打造Web3去中心化金融平台基础设施
Carbon path first, Huawei digital energy injects new momentum into the green development of Guangxi