当前位置:网站首页>Deep learning - patches are all you need
Deep learning - patches are all you need
2022-07-28 06:09:00 【Food to doubt life】
List of articles
Preface
This article is currently posted to ICLR 2022, There is no record of publication .
since 2020 year ViT Since its inception , About transformer Articles are emerging one after another ,ViT It has such good performance , Because of its unique network structure , Or because of its unique input form ? This article is designed ConvMixer The Internet , Verified ViT Its good performance may come from its unique input form .
This article will briefly introduce ConvMixer The structure of the network , And briefly summarize by ViT Several ideas for designing network structure are introduced , Finally, I will briefly talk about my views on this article .
ConvMixer Structure

and ViT similar ,ConvMixer The input image will be cut into several non coincident patch, Every patch The size is p ∗ p p * p p∗p, Go on h h h individual kernel size by p ∗ p p * p p∗p, The number of steps is p p p The convolution of , Get a size of h ∗ n / p ∗ n / p h * n/p * n/p h∗n/p∗n/p Of feature map, This process is ViT Medium patch embedding(patch embedding Itself can be equivalent to the above operation ).
Then the data will go through d d d layer ConvMixier Layer Handle ,Depthwise Convolution Simulated MLP mixer Medium spatial mixing, For blending spatial information ,Pointwise Convolution Simulated MLP mixer Medium channel-wise mixing, Used to mix channel information .ConvMixier Layer and No down sampling , Output from different layers feature map Of resolution and channel The numbers are consistent .
The last layer outputs feature map It will go through a global pooling process , Input into a classifier for classification
There are not many experiments in this paper , This paper gives the ImageNet 1k The results of the experiment on .ConvMixer-1536/20 Express feature map Of channel The number of 1536, The network depth is 20, And so on 
You can see ,ConvMixer-1536/20 The performance of is better than that of plain Mixer-B/16, And the number of parameters should be less
Some ideas for designing network structure
ViT After coming out , There is a kind of Isotropic architectures, That is, first use patch embeddings, Output from different layers feature map Of channel and resolution Exactly the same , Similar to that proposed in this paper ConvMixer.
At present, there is also some work , take ViT And CNN combination .
reflection
This article and MLP mixer Is very similar , and MLP mixer Many operations in can themselves be equivalent to Depthwise Convolution and Pointwise Convolution, There are not many differences in the structure of the model .
The author designed ConvMixer contain ViT Two factors in ,Isotropic and patch embedding, The article does not discuss which of these two factors has a greater impact on performance ,patch embedding The essence of convolution is to convolute the input image with a large convolution check , Well, can we do it in ResNet Introduction in patch embedding, If the performance of the model is improved , that ViT Its performance is likely to come from its unique input form , To some extent, it is more in line with the title of the paper .
边栏推荐
- 深度学习(自监督:MoCo V3):An Empirical Study of Training Self-Supervised Vision Transformers
- 强化学习——连续控制
- Dataset类分批加载数据集
- Xshell suddenly failed to connect to the virtual machine
- 简单理解一下MVC和三层架构
- 知识点21-泛型
- 强化学习——基础概念
- 高端大气的小程序开发设计有哪些注意点?
- CertPathValidatorException:validity check failed
- Shutter webivew input evokes camera albums
猜你喜欢

Tensorboard visualization

How to improve the efficiency of small program development?

Use Python to encapsulate a tool class that sends mail regularly

小程序开发

matplotlib数据可视化

Applet development

深度学习(增量学习)——(ICCV)Striking a Balance between Stability and Plasticity for Class-Incremental Learning

3: MySQL master-slave replication setup

tf.keras搭建神经网络功能扩展

深度学习——Pay Attention to MLPs
随机推荐
使用pyhon封装一个定时发送邮件的工具类
alpine,debian替换源
Distributed cluster architecture scenario optimization solution: session sharing problem
There is a problem with MySQL paging
Distributed lock redis implementation
深度学习(自监督:SimSiam)——Exploring Simple Siamese Representation Learning
Four perspectives to teach you to choose applet development tools?
Hit your face ins? Mars digital collection platform explores digital collection light social networking
Marsnft: how do individuals distribute digital collections?
How much does it cost to make a small program mall? What are the general expenses?
vscode uniapp
Uniapp WebView listens to the callback after the page is loaded
What are the advantages of small program development system? Why choose it?
What is the detail of the applet development process?
Kubesphere installation version problem
How to use Bert
循环神经网络
微信小程序开发详细步骤是什么?
小程序商城制作一个需要多少钱?一般包括哪些费用?
1: Why should databases be divided into databases and tables