当前位置:网站首页>Swin Transformer【Backbone】
Swin Transformer【Backbone】
2022-07-26 03:00:00 【It's too simple】
background
Swin Transformer yes ICCV2021 Best paper .
ViT Give Way transformer from NLP Apply directly to CV There are two direct questions : Scale problem ( For example, pedestrians , The size of the car is NLP There is no field ), Sequence problem ( If the basic unit is image pixels , Sequence too large ).16 individual patch( Low resolution ) send ViT It may not be suitable for intensive predictive tasks , Global modeling increases the computational complexity by a factor of two .
Swin Transformer Give Way transformer It can also do hierarchical feature extraction , Make the extracted features have the concept of multi-scale . Calculating self attention in the window reduces the sequence length ( Computational complexity increases linearly with image size , Non square growth ), Moving makes the interaction between two adjacent windows . Semantically similar parts probably appear in adjacent areas , such local The design of is completely adequate ,ViT The overall design of is redundant .
in general , It refers to the combination of window sliding of convolution and the grasp advantage of its own global vision .
Model structure

The initial operation can be compared ViT understand , Suppose the input image [224,224,3], after Patch Partition become involved [56,56,48]( similar ViT, But the patch The size is 4*4), after Linear Embedding formation [56,56,96], Change to [3136*96], Then type it into Block The final treatment is [3136*96]. Then type in Patch Merging( Space size is divided by 2, Multiply the number of channels 2, In order to compare convolutional neural networks, we have such an operation , This operation can be understood as exchanging space for dimension , See the video for reference , Very detailed .) become [28,28,192], Cycle in turn to form the whole Swin Transformer.
Swin Transformer Block

Mentioned earlier ,[56,56,96] After the tensor of input block , It's in 7*7 Self attention in the window of . See the first figure (b) yes Swin Transformer A basic computing unit of ( There are two Block), First calculate self attention in the window , And then shift The back window does self attention .
How to calculate the self attention after sliding window B Stand on the explanation of God Mu .
Reference resources
B Stand Mu Shen Swin Transformer Intensive reading 【 Intensive reading 】
边栏推荐
- Games101 review: shading, rendering pipelines
- Get hours, minutes and seconds
- Literature speed reading | in the face of danger, anxious people run faster?
- Software testing post: Ali has three sides. Fortunately, he has made full preparations and has been offered
- (PC+WAP)织梦模板蔬菜水果类网站
- Jenkins' study notes are detailed
- .net serialize enumeration as string
- 织梦提示你设定了字段为联动类型如何解决
- How to design automated test cases?
- 案例:使用keepalived+Haproxy搭建Web群集
猜你喜欢
![[sql] usage of self connection](/img/92/92474343b4b4e6ea60453b4799cb55.jpg)
[sql] usage of self connection

Keyboardtraffic, a tool developed by myself to solve CTF USB keyboard traffic

Cycle and branch (I)

FPGA_ Initial use process of vivado software_ Ultra detailed

如何用U盘进行装机?

GAMES101复习:着色(Shading)、渲染管线

移位距离和假设的应用

Jenkins' study notes are detailed

My friend took 25koffer as soon as he learned automation test. When will my function test end?

第3章业务功能开发(删除线索)
随机推荐
Arthas' dynamic load class (retransform)
Nahamcon CTF 2022 babyrev reverse analysis
The source of everything, the choice of code branching strategy
Standardize your own debug process
MySQL build websites data table
图像识别(六)| 激活函数
Information system project managers must recite the core examination site (50). The contract content is not clearly stipulated
ES6 advanced - using prototype object inheritance methods
.net serialize enumeration as string
MySQL tutorial: MySQL database learning classic (from getting started to mastering)
FPGA_ Initial use process of vivado software_ Ultra detailed
[SQL] CASE表达式
Skill list of image processing experts
从各大APP年度报告看用户画像——标签,比你更懂你自己
当点击Play以后,EditorWindow中的变量会被莫名其妙销毁.
Binary search 33. search rotation sort array
AMD64(x86_64)架构abi文档:中
Application of shift distance and hypothesis
Is it safe to open galaxy securities account by mobile phone?
Case: using kept+haproxy to build a Web Cluster