当前位置:网站首页>Deep learning - metaformer is actually what you need for vision
Deep learning - metaformer is actually what you need for vision
2022-07-28 06:10:00 【Food to doubt life】
List of articles
Preface
This paper summarizes CVPR2022 Of oral article 《MetaFormer Is Actually What You Need for Vision》. This article studies ViT Structure and class MLP The model of the structure , Extract the same part of both , Make up the MetaFormer structure , It is pointed out that the performance of both benefit from MetaFormer structure , Then on this basis, it puts forward PoolFormer structure .
MetaFormer structure

On the left is MetaFormer structure ,MetaFormer Medium Token Mixer Modules are used to mix multiple token Information between . This module is in class ViT The model of the structure corresponds to Attention modular ( for example DeiT), While in class MLP The model of structure corresponds to SpatialMLP modular ( for example GMLP、ResMLP).
Author points out MetaFormer Structure is class ViT And the class MLP The main source of model performance , To test this , The author will MetaFormer The structure of the Token Moixer Replace the module with an identity map ( In fact, it is a large convolution model , A single channel large convolution processing a characteristic graph (token)), Model in ImageNet Upper top-1 The accuracy can still reach 74.3%.
Besides , The author found that removing Norm、Channel MLP、Shortcut Any module in , The models are difficult to converge , So it's verified that MetaFormer The structure in is indispensable .
PoolFormer structure
The author in MetaFormer On the basis of the introduction of PoolFormer structure ,PoolFormer Integrate multiple through average pooling Token Information between , Compared with Attention and SpatialMLP, Pooling does not introduce additional parameters , And the amount of calculation is smaller .PoolFormer The structure of is shown in the figure below :
among Pooling Operation of the Pytorch The code is as follows 
Notice that there is a subtraction operation after the pooling operation , The author's explanation ( See the note above ) I don't really agree , This operation is more like trick, But I didn't find the performance change of the model after removing this subtraction operation in the article .
PoolFormer stay ImageNet The accuracy is shown in the figure below , All models are not used pretrain The weight of 
See the previous blog post for personal thoughts , The experimental results of target detection and other tasks are also included in the article , Do it now backbone Is getting more and more volume .
边栏推荐
- pytorch深度学习单卡训练和多卡训练
- KubeSphere安装版本问题
- Construction of redis master-slave architecture
- 小程序开发如何提高效率?
- 微信上的小程序店铺怎么做?
- Kubesphere installation version problem
- Distributed cluster architecture scenario optimization solution: distributed ID solution
- 循环神经网络
- Distributed cluster architecture scenario optimization solution: session sharing problem
- 【二】redis基础命令与使用场景
猜你喜欢

知识点21-泛型

深度学习(自监督:MoCo V3):An Empirical Study of Training Self-Supervised Vision Transformers

4个角度教你选小程序开发工具?

Deep learning (self supervision: CPC V2) -- data efficient image recognition with contractual predictive coding

Structured streaming in spark

小程序开发如何提高效率?

Linux(centOs7) 下安装redis

Use Python to encapsulate a tool class that sends mail regularly

深度学习(增量学习)——(ICCV)Striking a Balance between Stability and Plasticity for Class-Incremental Learning

【6】 Redis cache policy
随机推荐
Distributed lock redis implementation
Deep learning - patches are all you need
Pytorch deep learning single card training and multi card training
Sort method for sorting
知识点21-泛型
Structured streaming in spark
CertPathValidatorException:validity check failed
Deep learning (self supervision: simpl) -- a simple framework for contractual learning of visual representations
Centos7 installing MySQL
如何选择小程序开发企业
【二】redis基础命令与使用场景
Record the problems encountered in online capacity expansion server nochange: partition 1 is size 419428319. It cannot be grown
小程序开发
What are the points for attention in the development and design of high-end atmospheric applets?
Idempotent component
Ssh/scp breakpoint resume Rsync
How to improve the efficiency of small program development?
用于排序的sort方法
The business of digital collections is not so easy to do
TensorFlow2.1基本概念与常见函数