当前位置:网站首页>[NIPS2021]MLP-Mixer: An all-MLP Architecture for Vision
[NIPS2021]MLP-Mixer: An all-MLP Architecture for Vision
2022-06-11 04:54:00 【Shenlan Shenyan AI】
Convolutional neural networks (CNN) It's the preferred model for computer vision . lately , Attention based networks ( for example ViT) It has also become very popular . In this paper , We show that , Although convolution and attention are enough to achieve good performance , But none of them is necessary . The article introduces MLP-Mixer, This is a multi-layer perceptron only (MLP) Architecture of .MLP-Mixer There are two types of layers : A method with independent application to image patches Of MLP( namely “ blend ” Every location feature ), The other has a cross patches Applied MLP( namely “ blend ” Spatial information ).
Address of thesis :https://arxiv.org/pdf/2105.01601.pdf
Code :https://link.zhihu.com/?target=https%3A//github.com/google-research/vision_transformer
Model structure
MLP-Mixer The architecture is completely based on multi-layer perceptron , It can be applied repeatedly in spatial position or feature channel .MLP-Mixer The overall structure is shown in the figure below :

MLP-Mixer The overall structure
MLP-Mixer The overall idea is : First split the input picture into

individual patches( Every patch There is no overlap between , The size is ), Then put each patch It maps to Two dimensional vector of as input . adopt Per-patch Fully-connected The operation of the layer will each patch convert to feature embedding, And then send in N individual Mixer Layer. Last ,MLP-Mixer Use the standard classification header with the global average pooling layer , Subsequent use Fully-connected To classify .
MLP-Mixer The architecture uses two different types of MLP layer :token-mixing MLP and channel-mixing MLP. every last Mixer Layer Both of these two types of MLP form .token-mixing MLP Allow different spatial locations (token) Communicate with each other , Act on

The column of , With span patches Applied MLP( namely “ blend ” Spatial information );channel-mixing MLP Allow communication between different channels , Act on The line of , With independent application to images patches Of MLP( namely “ blend ” Every location feature ).token-mixing MLP It works on every patch On the list of , First of all patches Partially transposed , And all column parameters share MLP1, The resulting output is transposed again .channel-mixing MLP It works on every patch In my line of business , All row parameters share MLP2. These two types of layers are executed alternately to facilitate information interaction between the two dimensions .
token-mixing MLP Different spatial positions are realized (tokens, That is, each of the images patch) Information fusion between .token-mixing MLP Each column of the table will be entered ( Refers to each image patch The tensor formed by the channels located in the same position in , In the picture MLP1 The input of ) As input , And process each channel independently .
channel-mixing MLP Information fusion between different channels is realized .channel-mixing MLP Take each row of the input table as input , And independently for each token( Refers to each image patch The tensor of composition , In the picture MLP2 The input of ) To deal with . The two layers are staggered , So as to realize the interaction between the two input dimensions . same channel-mixing MLP(token-mixing MLP) The operation is applied to the input matrix

Each line ( Column ). The details are shown in the following figure :

Mixer Layer Sketch Map
Mixer Layer The specific formula of can be expressed as follows :

among

. In the formula, the nonlinear function representing element by element (GELU[1]). and Namely token-mixing MLP and channel-mixing MLP Adjustable width in . Note that the selection is independent of the input patches The number of . therefore , The computational complexity of the network is in the input patches Is linear in quantity , and ViT The complexity in is quadratic .
channel-mixing MLP and token-mixing MLP Apply to input

Each line ( Column ) when , Naturally, the parameters in each layer will be bound , This provides position invariance , This is also a remarkable feature of convolution . However , Cross channel binding parameters are not common . For example, the separable convolution in convolution neural network convolutes each channel separately , Independent of other channels . however , In separable convolutions , Each channel uses a different convolution kernel , This is related to MLP-Mixer Medium token-mixing MLP Different ,token-mixing MLP All channels share the same full receive domain core . On the increase C or S when , Parameter bundling can prevent the network scale from growing too fast . And this design method will not affect the experimental results .( notes : Binding parameters are actually equivalent to shared parameters )
MLP-Mixer Every floor in the ( Except for the original patch Projection layer ) All have the same size input . except MLP layer ,MLP-Mixer Used skip-connections[2] and Layer Normalization And other standard architecture components . Besides, it's different from ViT Yes. ,MLP-Mixer No location code is used , because token-mixing MLP For input token The order of is already very sensitive , So you can learn how to represent positions . Last ,MLP-Mixer Use a standard classification header , It consists of a global average pooling layer and a linear classifier .
reference
【1】Gaussian Error Linear Units (GELUs). https://arxiv.org/pdf/1606.08415.pdf
【2】Deep residual learning for image recognition. https://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf
author :1435mm Distance of
| About Deep extension technology |

Shenyan technology was founded in 2018 year 1 month , Zhongguancun High tech enterprise , It is an enterprise with the world's leading artificial intelligence technology AI Service experts . In computer vision 、 Based on the core technology of natural language processing and data mining , The company launched four platform products —— Deep extension intelligent data annotation platform 、 Deep extension AI Development platform 、 Deep extension automatic machine learning platform 、 Deep extension AI Open platform , Provide data processing for enterprises 、 Model building and training 、 Privacy computing 、 One stop shop for Industry algorithms and solutions AI Platform services .
边栏推荐
- Description of construction scheme of Meizhou P2 Laboratory
- What is a smart network card? What is the function of the smart network card?
- IOU series (IOU, giou, Diou, CIO)
- 精益产品开发体系最佳实践及原则
- World programming language ranking in January 2022
- Simple knowledge distillation
- Analysis of hidden dangers in the construction of Fuzhou chemical laboratory
- Programming Examples Using RDMA Verbs
- 新库上线 | CnOpenData不可移动文物数据
- Target detection - personal understanding of RCNN series
猜你喜欢

PHP phone charge recharge channel website complete operation source code / full decryption without authorization / docking with the contract free payment interface

Detailed decomposition of the shortest path problem in Figure

codesys 獲取系統時間

Cartographer learning record: cartographer Map 3D visualization configuration (self recording dataset version)
![[Transformer]CoAtNet:Marrying Convolution and Attention for All Data Sizes](/img/88/041dd30cbc2e47b905ee37a52882a4.jpg)
[Transformer]CoAtNet:Marrying Convolution and Attention for All Data Sizes

How to quickly find the official routine of STM32 Series MCU

四大MQ的区别

Crmeb/v4.4 Standard Version open version mall source code applet official account h5+app mall source code

Writing a good research title: Tips & Things to avoid
![[Transformer]MViTv2:Improved Multiscale Vision Transformers for Classification and Detection](/img/97/a3e91e703b01aaceeb0d61545f9609.jpg)
[Transformer]MViTv2:Improved Multiscale Vision Transformers for Classification and Detection
随机推荐
codesys 获取系统时间
PostgreSQL database replication - background first-class citizen process walreceiver receiving and sending logic
Huawei equipment configuration MCE
C language test question 3 (grammar multiple choice question - including detailed explanation of knowledge points)
Iris dataset - Introduction to machine learning
Programming Examples Using RDMA Verbs
Crmeb/v4.4 Standard Version open version mall source code applet official account h5+app mall source code
Tianchi - student test score forecast
Analysis of hidden dangers in the construction of Fuzhou chemical laboratory
Legend has it that setting shader attributes with shader ID can improve efficiency:)
The data center is evolving towards the "four high" trend, and the OCP network card and the whole cabinet will be delivered into the mainstream of the future data center
2022年新高考1卷17题解析
选择数字资产托管人时,要问的 6 个问题
lower_ bound,upper_ Bound, two points
如何快速寻找STM32系列单片机官方例程
Support vector machine -svm+ source code
Huawei equipment configures local virtual private network mutual access
Top 100 video information of station B
Mindmanager22 professional mind mapping tool
华为设备配置跨域虚拟专用网