当前位置:网站首页>How is the combination of convolution and transformer optimal?
How is the combination of convolution and transformer optimal?
2022-06-29 19:16:00 【Zhiyuan community】
Recent research shows that ,Transformer Strong remote relationship modeling capability , However, there is nothing we can do to capture high-frequency local information . To solve this problem , This paper proposes Inception Transformer, abbreviation iFormer, It can effectively learn the comprehensive features of visual data containing high-frequency and low-frequency information .
say concretely , This paper designs a Inception mixer take Convolution and Maximum pooling Advantages of migrating to Transformer Capture high frequency information in . With the latest mixer Different ,Inception mixer Higher efficiency is achieved through the channel splitting mechanism , At the same time, parallel convolution / The maximum pool path and the self attention path serve as high-frequency mixer and low-frequency mixer It can flexibly model the discrimination information scattered in it .
in consideration of Low-level Layer Play more role in capturing high-frequency details , and High-level Layer Play more role in modeling low-frequency global information , The author further introduces frequency ramp structure, That is, gradually reduce to high-frequency mixer Dimensions , And increase low-frequency mixer Dimensions ( A word is ResNet Hierarchical design idea of ), The high-frequency and low-frequency components of different layers can be effectively weighed .
On a series of visual tasks iFormer Benchmarking , And shows its application in image classification 、COCO Detection and ADE20K Excellent performance in segmentation . for example ,iFormer-S stay ImageNet-1K Up to 83.4% Of top-1 Accuracy rate , Than DeiT-S Higher than 3.6%, The only 1/4 Parameters and 1/3 Of FLOPs Even slightly better than the larger model Swin-B (83.3%).

Thesis link :
https://arxiv.org/abs/2205.12956
边栏推荐
- jfinal中如何使用过滤器监控Druid监听SQL执行?
- powershell命令仅输出目录列表
- The sales volume could not catch up with the speed of taking money. Weima went to Hong Kong for emergency rescue
- Technical methodology of new AI engine under the data infrastructure upgrade window
- Escape and March, the "two-sided Jianghu" of temporary food
- Unittest unit test framework
- Several code screenshots beautification tools worth collecting by programmers
- 出逃与进军,临期食品的「双面江湖」
- docker compose 部署Flask项目并构建redis服务
- Qui vole dans un jeu d'écriture?
猜你喜欢

Deep learning --- the weight of the three good students' scores (2)

Why is informatization ≠ digitalization? Finally someone made it clear

销量赶不上拿钱速度,威马赴港救急

3 - 3 découverte de l'hôte - découverte à quatre niveaux

Unittest unit test framework

76. nearest common ancestor of binary tree

产品-Axure9(英文版),中继器(Repeater)实现表格内容的增删查改(CRUD)

C语言数组专题训练

4-1端口扫描技术

3-3主機發現-四層發現
随机推荐
Selenium WebDriver的高级特性
Arm 全面计算解决方案重新定义视觉体验强力赋能移动游戏
curl下载示例
Point, line, surface and body of enterprise digital transformation!
76.二叉树的最近公共祖先
3-3主機發現-四層發現
4-2 port banner information acquisition
Various API methods of selenium
出逃与进军,临期食品的「双面江湖」
AI场景存储优化:云知声超算平台基于 JuiceFS 的存储实践
Why is informatization ≠ digitalization? Finally someone made it clear
Arm comprehensive computing solution redefines visual experience and powerfully enables mobile games
selenium的跨浏览器测试
Unittest unit test framework
QC protocol + Huawei fcp+ Samsung AFC fast charging 5v9v chip fs2601 application
使用 OpenCV 的基于标记的增强现实
Sophomore majoring in software engineering, the previous learning situation is not very good. How to plan the follow-up development route
Third party tools and framework integration
乐鑫面试流程
76. nearest common ancestor of binary tree