当前位置:网站首页>Deep learning - patches are all you need
Deep learning - patches are all you need
2022-07-28 06:09:00 【Food to doubt life】
List of articles
Preface
This article is currently posted to ICLR 2022, There is no record of publication .
since 2020 year ViT Since its inception , About transformer Articles are emerging one after another ,ViT It has such good performance , Because of its unique network structure , Or because of its unique input form ? This article is designed ConvMixer The Internet , Verified ViT Its good performance may come from its unique input form .
This article will briefly introduce ConvMixer The structure of the network , And briefly summarize by ViT Several ideas for designing network structure are introduced , Finally, I will briefly talk about my views on this article .
ConvMixer Structure

and ViT similar ,ConvMixer The input image will be cut into several non coincident patch, Every patch The size is p ∗ p p * p p∗p, Go on h h h individual kernel size by p ∗ p p * p p∗p, The number of steps is p p p The convolution of , Get a size of h ∗ n / p ∗ n / p h * n/p * n/p h∗n/p∗n/p Of feature map, This process is ViT Medium patch embedding(patch embedding Itself can be equivalent to the above operation ).
Then the data will go through d d d layer ConvMixier Layer Handle ,Depthwise Convolution Simulated MLP mixer Medium spatial mixing, For blending spatial information ,Pointwise Convolution Simulated MLP mixer Medium channel-wise mixing, Used to mix channel information .ConvMixier Layer and No down sampling , Output from different layers feature map Of resolution and channel The numbers are consistent .
The last layer outputs feature map It will go through a global pooling process , Input into a classifier for classification
There are not many experiments in this paper , This paper gives the ImageNet 1k The results of the experiment on .ConvMixer-1536/20 Express feature map Of channel The number of 1536, The network depth is 20, And so on 
You can see ,ConvMixer-1536/20 The performance of is better than that of plain Mixer-B/16, And the number of parameters should be less
Some ideas for designing network structure
ViT After coming out , There is a kind of Isotropic architectures, That is, first use patch embeddings, Output from different layers feature map Of channel and resolution Exactly the same , Similar to that proposed in this paper ConvMixer.
At present, there is also some work , take ViT And CNN combination .
reflection
This article and MLP mixer Is very similar , and MLP mixer Many operations in can themselves be equivalent to Depthwise Convolution and Pointwise Convolution, There are not many differences in the structure of the model .
The author designed ConvMixer contain ViT Two factors in ,Isotropic and patch embedding, The article does not discuss which of these two factors has a greater impact on performance ,patch embedding The essence of convolution is to convolute the input image with a large convolution check , Well, can we do it in ResNet Introduction in patch embedding, If the performance of the model is improved , that ViT Its performance is likely to come from its unique input form , To some extent, it is more in line with the title of the paper .
边栏推荐
- How digital library realizes Web3.0 social networking
- 小程序开发
- 简单理解一下MVC和三层架构
- Digital collections become a new hot spot in tourism industry
- Installing redis under Linux (centos7)
- 【四】redis持久化(RDB与AOF)
- Notice of attack: [bean Bingbing] send, sell, cash, draw, prize, etc
- 【三】redis特点功能
- Nlp项目实战自定义模板框架
- 使用pycharm创建虚拟环境
猜你喜欢

使用神经网络实现对天气的预测

matplotlib数据可视化

word2vec和bert的基本使用方法

小程序开发解决零售业的焦虑

Construction of redis master-slave architecture

Dataset类分批加载数据集

Small program development solves the anxiety of retail industry

At the moment of the epidemic, online and offline travelers are trapped. Can the digital collection be released?

Distributed lock redis implementation

【7】 Consistency between redis cache and database data
随机推荐
小程序开发哪家更靠谱呢?
Digital collections "chaos", 100 billion market change is coming?
【四】redis持久化(RDB与AOF)
NLP中常用的utils
小程序制作小程序开发适合哪些企业?
深度学习(自监督:SimCLR)——A Simple Framework for Contrastive Learning of Visual Representations
How to use Bert
CertPathValidatorException:validity check failed
vscode uniapp
深度学习(自监督:CPC v2)——Data-Efficient Image Recognition with Contrastive Predictive Coding
Regular verification rules of wechat applet mobile number
【六】redis缓存策略
flutter webivew input唤起相机相册
matplotlib数据可视化
深度学习(增量学习)——ICCV2021:SS-IL: Separated Softmax for Incremental Learning
小程序开发流程详细是什么呢?
The project does not report an error, operates normally, and cannot request services
微信小程序开发制作注意这几个重点方面
强化学习——基础概念
深度学习(自监督:MoCo V3):An Empirical Study of Training Self-Supervised Vision Transformers