当前位置:网站首页>How is the combination of convolution and transformer optimal?
How is the combination of convolution and transformer optimal?
2022-06-29 19:16:00 【Zhiyuan community】
Recent research shows that ,Transformer Strong remote relationship modeling capability , However, there is nothing we can do to capture high-frequency local information . To solve this problem , This paper proposes Inception Transformer, abbreviation iFormer, It can effectively learn the comprehensive features of visual data containing high-frequency and low-frequency information .
say concretely , This paper designs a Inception mixer take Convolution and Maximum pooling Advantages of migrating to Transformer Capture high frequency information in . With the latest mixer Different ,Inception mixer Higher efficiency is achieved through the channel splitting mechanism , At the same time, parallel convolution / The maximum pool path and the self attention path serve as high-frequency mixer and low-frequency mixer It can flexibly model the discrimination information scattered in it .
in consideration of Low-level Layer Play more role in capturing high-frequency details , and High-level Layer Play more role in modeling low-frequency global information , The author further introduces frequency ramp structure, That is, gradually reduce to high-frequency mixer Dimensions , And increase low-frequency mixer Dimensions ( A word is ResNet Hierarchical design idea of ), The high-frequency and low-frequency components of different layers can be effectively weighed .
On a series of visual tasks iFormer Benchmarking , And shows its application in image classification 、COCO Detection and ADE20K Excellent performance in segmentation . for example ,iFormer-S stay ImageNet-1K Up to 83.4% Of top-1 Accuracy rate , Than DeiT-S Higher than 3.6%, The only 1/4 Parameters and 1/3 Of FLOPs Even slightly better than the larger model Swin-B (83.3%).

Thesis link :
https://arxiv.org/abs/2205.12956
边栏推荐
- startService() 过程
- layer. prompt
- 凌云出海记 | 文华在线&华为云:打造非洲智慧教学新方案
- 3-3主机发现-四层发现
- Who took advantage of the chaos and looted in Tiktok Wenwan?
- SQL Server Backup and restore command operations
- selenium的跨浏览器测试
- Rejected by a large company? Tencent experts summarized 11 reasons for being rejected!
- Oracle11.2.0.4-Rac集群hang分析记录
- Intégration d'outils et de cadres tiers
猜你喜欢

揭秘!付费会员制下的那些小心机!

细说GaussDB(DWS)复杂多样的资源负载管理手段

誰在抖音文玩裏趁亂打劫?

DAO 中存在的不足和优化方案

Using protobuf to link MySQL in unrealeengine plug-in

Point, line, surface and body of enterprise digital transformation!

Deep learning --- the weight of the three good students' scores (2)

3-3主機發現-四層發現

微信推出图片大爆炸功能;苹果自研 5G 芯片或已失败;微软解决导致 Edge 停止响应的 bug|极客头条

Inception 新结构 | 究竟卷积与Transformer如何结合才是最优的?
随机推荐
4-2端口Banner信息获取
SQL Server Backup and restore command operations
76.二叉树的最近公共祖先
元素等待机制
乐鑫面试流程
Fastdfs
ChainSafe跨链桥部署教程
Intégration d'outils et de cadres tiers
微信推出图片大爆炸功能;苹果自研 5G 芯片或已失败;微软解决导致 Edge 停止响应的 bug|极客头条
C语言数组专题训练
PHP Laravel 使用 aws 负载均衡器的 ip 错误问题
76. nearest common ancestor of binary tree
【软件测试】01 -- 软件生命周期、软件开发模型
防汛救援便携式应急通信系统解决方案
4-1端口扫描技术
有了这4个安全测试工具,对软件安全测试say so easy!
软件工程专业大二,之前的学习情况不太好该怎么规划后续发展路线
Technical methodology of new AI engine under the data infrastructure upgrade window
聊聊eureka的delta配置
PHP outputs all dates between two specified dates