当前位置:网站首页>Swin Transformer【Backbone】
Swin Transformer【Backbone】
2022-07-26 02:59:00 【太简单了】
背景
Swin Transformer是ICCV2021最佳论文。
ViT让transformer从NLP直接应用到CV有两个直接的问题:尺度问题(比如行人,车等大大小小的尺度问题在NLP领域就没有),序列问题(如果以图像像素点为基本单位,序列太大)。16个patch(分辨率低)使ViT可能不适合密集预测型的任务,全局建模使计算复杂度平方倍增长。
Swin Transformer让transformer也能做层级式的特征提取,使得提取的特征具备多尺度概念。窗口内计算自注意力使序列长度降低(计算复杂度随着图像大小线性增长,非平方级增长),移动使相邻两个窗口之间有了交互。语义相近的部分大概率出现在相邻的区域,这样local的设计是完全够用的,ViT的全局设计还是冗余的。
总的来说,是借鉴卷积的窗口滑动与自身全局视野的把握优势相结合。
模型结构

开始的操作可对比ViT理解,假设输入的图片[224,224,3],经过Patch Partition打成[56,56,48](类似ViT,不过这里的patch大小为4*4),经过Linear Embedding形成[56,56,96],改成[3136*96],再输入到Block的最终处理成[3136*96]。后再输入Patch Merging(空间大小除2,通道数乘2,为了对比卷积神经网络有了这样的操作,这种操作可以理解为以空间换维度,可看参考的视频,很详细。)成[28,28,192],依次循环构成整个Swin Transformer。
Swin Transformer Block

前面提到,[56,56,96]的张量输入块后,是在在7*7的窗口里算自注意力。见第一张图(b)是Swin Transformer的一个基本计算单元(含两个Block),先在窗口计算自注意力,再在shift后的窗口做自注意力。
如何算滑动窗口后的自注意力这个细节看B站沐神的讲解吧。
参考
B站沐神Swin Transformer论文精读【论文精读】
边栏推荐
- Personally test five efficient and practical ways to get rid of orders, and quickly collect them to help you quickly find high-quality objects!
- [reading notes] user portrait methodology and engineering solutions
- Longest Substring Without Repeating Characters
- [sql] usage of self connection
- ShardingSphere数据分片
- Skill list of image processing experts
- How about GF Securities? Is it safe to open an account online?
- [introduction to C language] zzulioj 1006-1010
- 信息系统项目管理师必背核心考点(五十)合同内容约定不明确规定
- AMD64(x86_64)架构abi文档:
猜你喜欢
![[C language] deeply understand integer lifting and arithmetic conversion](/img/5c/21d0df424c034721c64b0653edc483.png)
[C language] deeply understand integer lifting and arithmetic conversion

Games101 review: shading, rendering pipelines
![[pure theory] Yolo v4: optimal speed and accuracy of object detection](/img/1f/f38c3b38feed9e831ad84b4bbf81c0.png)
[pure theory] Yolo v4: optimal speed and accuracy of object detection

How to effectively prevent others from wearing the homepage snapshot of the website

Software testing post: Ali has three sides. Fortunately, he has made full preparations and has been offered

Turn on the LED
![[early knowledge of activities] list of recent activities of livevideostack](/img/14/d2cdae45a18a5bba7ee1ffab903af2.jpg)
[early knowledge of activities] list of recent activities of livevideostack

Self-supervised learning method to solve the inverse problem of Fokker-Planck Equation

Anti electronic ink screen st7302

c语言分层理解(c语言函数)
随机推荐
Exclusive interview with ringcentral he Bicang: empowering future mixed office with innovative MVP
Self-supervised learning method to solve the inverse problem of Fokker-Planck Equation
AMD64(x86_64)架构abi文档:
如何根据登录测试的需求设计测试用例?
Games101 review: shading, rendering pipelines
Method of manually cloning virtual machine in esxi6.7
第3章业务功能开发(删除线索)
(pc+wap) dream weaving template vegetable and fruit websites
Binary search 33. search rotation sort array
Win11大小写提示图标怎么关闭?Win11大小写提示图标的关闭方法
Arthas' dynamic load class (retransform)
Zhimeng prompts you how to solve the problem of setting the field as linkage type
[steering wheel] use the 60 + shortcut keys of idea to share with you, in order to improve efficiency (reconstruction)
Software testing post: Ali has three sides. Fortunately, he has made full preparations and has been offered
[translation] announce Vites 13
ES6 advanced - using prototype object inheritance methods
Wechat official account mutual aid, open white groups, and small white newspaper groups to keep warm
Pinia的数据持久化插件 pinia-plugin-persist
简单使用 MySQL 索引
Stack Title: the longest absolute path of a file