当前位置:网站首页>[NIPS2021]MLP-Mixer: An all-MLP Architecture for Vision

[NIPS2021]MLP-Mixer: An all-MLP Architecture for Vision

2022-06-11 04:54:00 Shenlan Shenyan AI

Convolutional neural networks (CNN) It's the preferred model for computer vision .  lately , Attention based networks ( for example ViT) It has also become very popular .  In this paper , We show that , Although convolution and attention are enough to achieve good performance , But none of them is necessary . The article introduces MLP-Mixer, This is a multi-layer perceptron only (MLP) Architecture of .MLP-Mixer There are two types of layers : A method with independent application to image patches Of MLP( namely “ blend ” Every location feature ), The other has a cross patches Applied MLP( namely “ blend ” Spatial information ).

Address of thesis :https://arxiv.org/pdf/2105.01601.pdf

Code :https://link.zhihu.com/?target=https%3A//github.com/google-research/vision_transformer

Model structure

MLP-Mixer The architecture is completely based on multi-layer perceptron , It can be applied repeatedly in spatial position or feature channel .MLP-Mixer The overall structure is shown in the figure below :

MLP-Mixer The overall structure

MLP-Mixer The overall idea is : First split the input picture into  

  individual patches( Every patch There is no overlap between , The size is   ), Then put each patch It maps to    Two dimensional vector of as input   . adopt Per-patch Fully-connected The operation of the layer will each patch convert to feature embedding, And then send in N individual Mixer Layer. Last ,MLP-Mixer Use the standard classification header with the global average pooling layer , Subsequent use Fully-connected To classify .

MLP-Mixer The architecture uses two different types of MLP layer :token-mixing MLP and channel-mixing MLP. every last Mixer Layer Both of these two types of MLP form .token-mixing MLP Allow different spatial locations (token) Communicate with each other , Act on  

  The column of , With span patches Applied MLP( namely “ blend ” Spatial information );channel-mixing MLP Allow communication between different channels , Act on    The line of , With independent application to images patches Of MLP( namely “ blend ” Every location feature ).token-mixing MLP It works on every patch On the list of , First of all patches Partially transposed , And all column parameters share MLP1, The resulting output is transposed again .channel-mixing MLP It works on every patch In my line of business , All row parameters share MLP2. These two types of layers are executed alternately to facilitate information interaction between the two dimensions .

token-mixing MLP Different spatial positions are realized (tokens, That is, each of the images patch) Information fusion between .token-mixing MLP Each column of the table will be entered ( Refers to each image patch The tensor formed by the channels located in the same position in , In the picture MLP1 The input of ) As input , And process each channel independently .

channel-mixing MLP Information fusion between different channels is realized .channel-mixing MLP Take each row of the input table as input , And independently for each token( Refers to each image patch The tensor of composition , In the picture MLP2 The input of ) To deal with . The two layers are staggered , So as to realize the interaction between the two input dimensions . same channel-mixing MLP(token-mixing MLP) The operation is applied to the input matrix  

  Each line ( Column ). The details are shown in the following figure :

Mixer Layer Sketch Map

Mixer Layer The specific formula of can be expressed as follows :

among  

 . In the formula, the nonlinear function representing element by element (GELU[1]). and   Namely token-mixing MLP and channel-mixing MLP Adjustable width in . Note that the selection is independent of the input patches The number of . therefore , The computational complexity of the network is in the input patches Is linear in quantity , and ViT The complexity in is quadratic .

channel-mixing MLP and token-mixing MLP Apply to input

Each line ( Column ) when , Naturally, the parameters in each layer will be bound , This provides position invariance , This is also a remarkable feature of convolution . However , Cross channel binding parameters are not common . For example, the separable convolution in convolution neural network convolutes each channel separately , Independent of other channels . however , In separable convolutions , Each channel uses a different convolution kernel , This is related to MLP-Mixer Medium token-mixing MLP Different ,token-mixing MLP All channels share the same full receive domain core . On the increase C or S when , Parameter bundling can prevent the network scale from growing too fast . And this design method will not affect the experimental results .( notes : Binding parameters are actually equivalent to shared parameters )

MLP-Mixer Every floor in the ( Except for the original patch Projection layer ) All have the same size input . except MLP layer ,MLP-Mixer Used skip-connections[2] and Layer Normalization And other standard architecture components . Besides, it's different from ViT Yes. ,MLP-Mixer No location code is used , because token-mixing MLP For input token The order of is already very sensitive , So you can learn how to represent positions . Last ,MLP-Mixer Use a standard classification header , It consists of a global average pooling layer and a linear classifier .

reference

【1】Gaussian Error Linear Units (GELUs). https://arxiv.org/pdf/1606.08415.pdf

【2】Deep residual learning for image recognition. https://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf

author :1435mm Distance of

| About Deep extension technology |

Shenyan technology was founded in 2018 year 1 month , Zhongguancun High tech enterprise , It is an enterprise with the world's leading artificial intelligence technology AI Service experts . In computer vision 、 Based on the core technology of natural language processing and data mining , The company launched four platform products —— Deep extension intelligent data annotation platform 、 Deep extension AI Development platform 、 Deep extension automatic machine learning platform 、 Deep extension AI Open platform , Provide data processing for enterprises 、 Model building and training 、 Privacy computing 、 One stop shop for Industry algorithms and solutions AI Platform services . 

原网站

版权声明
本文为[Shenlan Shenyan AI]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/03/202203020544262213.html