当前位置：网站首页>[NIPS2021]MLP-Mixer: An all-MLP Architecture for Vision

[NIPS2021]MLP-Mixer: An all-MLP Architecture for Vision

2022-06-11 04:54:00 【Shenlan Shenyan AI】

Convolutional neural networks （CNN） It's the preferred model for computer vision . lately , Attention based networks （ for example ViT） It has also become very popular . In this paper , We show that , Although convolution and attention are enough to achieve good performance , But none of them is necessary . The article introduces MLP-Mixer, This is a multi-layer perceptron only （MLP） Architecture of .MLP-Mixer There are two types of layers ： A method with independent application to image patches Of MLP（ namely “ blend ” Every location feature ）, The other has a cross patches Applied MLP（ namely “ blend ” Spatial information ）.

Address of thesis ：https://arxiv.org/pdf/2105.01601.pdf

Code ：https://link.zhihu.com/?target=https%3A//github.com/google-research/vision_transformer

Model structure

MLP-Mixer The architecture is completely based on multi-layer perceptron , It can be applied repeatedly in spatial position or feature channel .MLP-Mixer The overall structure is shown in the figure below ：

MLP-Mixer The overall structure

MLP-Mixer The overall idea is ： First split the input picture into

individual patches（ Every patch There is no overlap between , The size is ）, Then put each patch It maps to Two dimensional vector of as input . adopt Per-patch Fully-connected The operation of the layer will each patch convert to feature embedding, And then send in N individual Mixer Layer. Last ,MLP-Mixer Use the standard classification header with the global average pooling layer , Subsequent use Fully-connected To classify .

MLP-Mixer The architecture uses two different types of MLP layer ：token-mixing MLP and channel-mixing MLP. every last Mixer Layer Both of these two types of MLP form .token-mixing MLP Allow different spatial locations （token） Communicate with each other , Act on

The column of , With span patches Applied MLP（ namely “ blend ” Spatial information ）;channel-mixing MLP Allow communication between different channels , Act on The line of , With independent application to images patches Of MLP（ namely “ blend ” Every location feature ）.token-mixing MLP It works on every patch On the list of , First of all patches Partially transposed , And all column parameters share MLP1, The resulting output is transposed again .channel-mixing MLP It works on every patch In my line of business , All row parameters share MLP2. These two types of layers are executed alternately to facilitate information interaction between the two dimensions .

token-mixing MLP Different spatial positions are realized （tokens, That is, each of the images patch） Information fusion between .token-mixing MLP Each column of the table will be entered （ Refers to each image patch The tensor formed by the channels located in the same position in , In the picture MLP1 The input of ） As input , And process each channel independently .

channel-mixing MLP Information fusion between different channels is realized .channel-mixing MLP Take each row of the input table as input , And independently for each token（ Refers to each image patch The tensor of composition , In the picture MLP2 The input of ） To deal with . The two layers are staggered , So as to realize the interaction between the two input dimensions . same channel-mixing MLP（token-mixing MLP） The operation is applied to the input matrix

Each line （ Column ）. The details are shown in the following figure ：

Mixer Layer Sketch Map

Mixer Layer The specific formula of can be expressed as follows ：

among

. In the formula, the nonlinear function representing element by element （GELU[1]）. and Namely token-mixing MLP and channel-mixing MLP Adjustable width in . Note that the selection is independent of the input patches The number of . therefore , The computational complexity of the network is in the input patches Is linear in quantity , and ViT The complexity in is quadratic .

channel-mixing MLP and token-mixing MLP Apply to input

Each line （ Column ） when , Naturally, the parameters in each layer will be bound , This provides position invariance , This is also a remarkable feature of convolution . However , Cross channel binding parameters are not common . For example, the separable convolution in convolution neural network convolutes each channel separately , Independent of other channels . however , In separable convolutions , Each channel uses a different convolution kernel , This is related to MLP-Mixer Medium token-mixing MLP Different ,token-mixing MLP All channels share the same full receive domain core . On the increase C or S when , Parameter bundling can prevent the network scale from growing too fast . And this design method will not affect the experimental results .（ notes ： Binding parameters are actually equivalent to shared parameters ）

MLP-Mixer Every floor in the （ Except for the original patch Projection layer ） All have the same size input . except MLP layer ,MLP-Mixer Used skip-connections[2] and Layer Normalization And other standard architecture components . Besides, it's different from ViT Yes. ,MLP-Mixer No location code is used , because token-mixing MLP For input token The order of is already very sensitive , So you can learn how to represent positions . Last ,MLP-Mixer Use a standard classification header , It consists of a global average pooling layer and a linear classifier .

reference

【1】Gaussian Error Linear Units (GELUs). https://arxiv.org/pdf/1606.08415.pdf

【2】Deep residual learning for image recognition. https://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf

author ：1435mm Distance of

｜ About Deep extension technology ｜

Shenyan technology was founded in 2018 year 1 month , Zhongguancun High tech enterprise , It is an enterprise with the world's leading artificial intelligence technology AI Service experts . In computer vision 、 Based on the core technology of natural language processing and data mining , The company launched four platform products —— Deep extension intelligent data annotation platform 、 Deep extension AI Development platform 、 Deep extension automatic machine learning platform 、 Deep extension AI Open platform , Provide data processing for enterprises 、 Model building and training 、 Privacy computing 、 One stop shop for Industry algorithms and solutions AI Platform services .

原网站

版权声明
本文为[Shenlan Shenyan AI]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/03/202203020544262213.html

当前位置：网站首页>[NIPS2021]MLP-Mixer: An all-MLP Architecture for Vision

[NIPS2021]MLP-Mixer: An all-MLP Architecture for Vision

边栏推荐

猜你喜欢

随机推荐