当前位置:网站首页>Deep learning - metaformer is actually what you need for vision
Deep learning - metaformer is actually what you need for vision
2022-07-28 06:10:00 【Food to doubt life】
List of articles
Preface
This paper summarizes CVPR2022 Of oral article 《MetaFormer Is Actually What You Need for Vision》. This article studies ViT Structure and class MLP The model of the structure , Extract the same part of both , Make up the MetaFormer structure , It is pointed out that the performance of both benefit from MetaFormer structure , Then on this basis, it puts forward PoolFormer structure .
MetaFormer structure

On the left is MetaFormer structure ,MetaFormer Medium Token Mixer Modules are used to mix multiple token Information between . This module is in class ViT The model of the structure corresponds to Attention modular ( for example DeiT), While in class MLP The model of structure corresponds to SpatialMLP modular ( for example GMLP、ResMLP).
Author points out MetaFormer Structure is class ViT And the class MLP The main source of model performance , To test this , The author will MetaFormer The structure of the Token Moixer Replace the module with an identity map ( In fact, it is a large convolution model , A single channel large convolution processing a characteristic graph (token)), Model in ImageNet Upper top-1 The accuracy can still reach 74.3%.
Besides , The author found that removing Norm、Channel MLP、Shortcut Any module in , The models are difficult to converge , So it's verified that MetaFormer The structure in is indispensable .
PoolFormer structure
The author in MetaFormer On the basis of the introduction of PoolFormer structure ,PoolFormer Integrate multiple through average pooling Token Information between , Compared with Attention and SpatialMLP, Pooling does not introduce additional parameters , And the amount of calculation is smaller .PoolFormer The structure of is shown in the figure below :
among Pooling Operation of the Pytorch The code is as follows 
Notice that there is a subtraction operation after the pooling operation , The author's explanation ( See the note above ) I don't really agree , This operation is more like trick, But I didn't find the performance change of the model after removing this subtraction operation in the article .
PoolFormer stay ImageNet The accuracy is shown in the figure below , All models are not used pretrain The weight of 
See the previous blog post for personal thoughts , The experimental results of target detection and other tasks are also included in the article , Do it now backbone Is getting more and more volume .
边栏推荐
猜你喜欢

知识点21-泛型

How to improve the efficiency of small program development?

How to do wechat group purchase applet? How much does it usually cost?

self-attention学习笔记

matplotlib数据可视化

微信小程序开发费用制作费用是多少?

Matplotlib data visualization

Deep learning (self supervision: simpl) -- a simple framework for contractual learning of visual representations

如何选择小程序开发企业

深度学习(增量学习)——ICCV2022:Contrastive Continual Learning
随机推荐
Deep learning (self supervision: CPC V2) -- data efficient image recognition with contractual predictive coding
强化学习——不完全观测问题、MCTS
深度学习(增量学习)——ICCV2021:SS-IL: Separated Softmax for Incremental Learning
Alpine, Debian replacement source
无约束低分辨率人脸识别综述一:用于低分辨率人脸识别的数据集
小程序搭建制作流程是怎样的?
Digital collections strengthen reality with emptiness, enabling the development of the real economy
Uniapp WebView listens to the callback after the page is loaded
分布式集群架构场景化解决方案:集群时钟同步问题
The combination of cultural tourism and digital collections has a significant effect, but how to support users' continuous purchasing power
Manually create a simple RPC (< - < -)
深度学习(增量学习)——ICCV2022:Contrastive Continual Learning
3: MySQL master-slave replication setup
Distributed cluster architecture scenario optimization solution: session sharing problem
Uview upload component upload upload auto upload mode image compression
强化学习——Proximal Policy Optimization Algorithms
微信上的小程序店铺怎么做?
What are the advantages of small program development system? Why choose it?
word2vec+回归模型实现分类任务
Digital collections "chaos", 100 billion market change is coming?