当前位置:网站首页>Swin Transformer【Backbone】
Swin Transformer【Backbone】
2022-07-26 02:59:00 【太简单了】
背景
Swin Transformer是ICCV2021最佳论文。
ViT让transformer从NLP直接应用到CV有两个直接的问题:尺度问题(比如行人,车等大大小小的尺度问题在NLP领域就没有),序列问题(如果以图像像素点为基本单位,序列太大)。16个patch(分辨率低)使ViT可能不适合密集预测型的任务,全局建模使计算复杂度平方倍增长。
Swin Transformer让transformer也能做层级式的特征提取,使得提取的特征具备多尺度概念。窗口内计算自注意力使序列长度降低(计算复杂度随着图像大小线性增长,非平方级增长),移动使相邻两个窗口之间有了交互。语义相近的部分大概率出现在相邻的区域,这样local的设计是完全够用的,ViT的全局设计还是冗余的。
总的来说,是借鉴卷积的窗口滑动与自身全局视野的把握优势相结合。
模型结构

开始的操作可对比ViT理解,假设输入的图片[224,224,3],经过Patch Partition打成[56,56,48](类似ViT,不过这里的patch大小为4*4),经过Linear Embedding形成[56,56,96],改成[3136*96],再输入到Block的最终处理成[3136*96]。后再输入Patch Merging(空间大小除2,通道数乘2,为了对比卷积神经网络有了这样的操作,这种操作可以理解为以空间换维度,可看参考的视频,很详细。)成[28,28,192],依次循环构成整个Swin Transformer。
Swin Transformer Block

前面提到,[56,56,96]的张量输入块后,是在在7*7的窗口里算自注意力。见第一张图(b)是Swin Transformer的一个基本计算单元(含两个Block),先在窗口计算自注意力,再在shift后的窗口做自注意力。
如何算滑动窗口后的自注意力这个细节看B站沐神的讲解吧。
参考
B站沐神Swin Transformer论文精读【论文精读】
边栏推荐
- Anti electronic ink screen st7302
- [reading notes] user portrait methodology and engineering solutions
- [translation] safety. Value of sboms
- 1. Software testing ----- the basic concept of software testing
- Arthas download and startup
- DFS Niuke maze problem
- I hope you can help me with MySQL
- Neo4j import CSV data error: neo4j load CSV error: couldn't load the external resource
- Influence of middle tap change on ZVS oscillation circuit
- (PC+WAP)织梦模板蔬菜水果类网站
猜你喜欢

FPGA_Vivado软件初次使用流程_超详细

JS get the time composition array of two time periods

Have you ever seen this kind of dynamic programming -- the stock problem of state machine dynamic programming (Part 1)
![[C language] deeply understand integer lifting and arithmetic conversion](/img/5c/21d0df424c034721c64b0653edc483.png)
[C language] deeply understand integer lifting and arithmetic conversion

(9) Attribute introspection

Autojs cloud control source code + display

FPGA_ Initial use process of vivado software_ Ultra detailed

MySQL tutorial: MySQL database learning classic (from getting started to mastering)

Arthas view the source code of the loaded class (JAD)
![[C Advanced] deeply explore the storage of data (in-depth analysis + interpretation of typical examples)](/img/1e/33f9cc9446dcad8cdb78babbb5a22c.jpg)
[C Advanced] deeply explore the storage of data (in-depth analysis + interpretation of typical examples)
随机推荐
ES6高级-利用构造函数继承父类属性
The source of everything, the choice of code branching strategy
移位距离和假设的应用
Image recognition (VI) | activation function
How can users create data tables on Web pages and store them in the database
[reading notes] user portrait methodology and engineering solutions
After clicking play, the variables in editorwindow will be destroyed inexplicably
如何加速矩阵乘法
el-table 表头合并前四列,合并成一个单元格
The El table header merges the first four columns into one cell
Neo4j import CSV data error: neo4j load CSV error: couldn't load the external resource
Basics - network and server
富文本转化为普通文本
文件操作(一)——文件简介与文件的打开方式和关闭
Self-supervised learning method to solve the inverse problem of Fokker-Planck Equation
From the annual reports of major apps, we can see that user portraits - labels know you better than you do
Vofa+ serial port debugging assistant
朋友刚学完自动化测试就拿25Koffer,我功能测试何时才能到头?
Arthas download and startup
Usage of arguments.callee