当前位置:网站首页>Swin transformer code explanation
Swin transformer code explanation
2022-06-12 16:45:00 【QT-Smile】
Swin Transformer Code explanation
The subsampling is 4 times , therefore patch_size=4

2.
3.
emded_dim=96 It is in the following picture C, After the first Linear Embedding Number of channels after processing 

4.
After the first full connection layer, the number of channels is doubled 

6.
stay Muhlti-Head Attention Whether to use qkv——bias, The default here is to use 
7.
first drop_rate Connect to PatchEmbed Back 


the second drop_rate stay Multi-Head Attention Used in the process 
9.
Third drop_rate It means in every Swin Transformer Blocks Used in , It's from 0 Slowly grow to 0.1 Of 

10.
patch_norm Default grounding PatchEmbed hinder 
11.
Default not to use , Use the words , It just saves memory 
12.
Corresponds to each stage in Swin Transformer Blocks The number of 
13.
Corresponding stage4 Number of output channels of 

14.
PatchEmbed It is to divide the pictures into non overlapping ones patches
15.
PatchEmbed Corresponding to the high in the structure drawing Patch Partition and Linearing Embedding

16.
Patch Partition It is really realized by a convolution 
17.
On the right in the width direction pad And at the bottom of the height direction pad
18.
From the dimension 2 Beginning to flatten 
19.
Here is the first one mentioned above drop_rate Connect directly to patchEmbed Back 
20.
For the used Swin Transformer Blocks Set up a drop_path_rate. from 0 Start all the way to drop_path_rate
21.
Traversal generates each stage, And in the code stage And in the paper stage There's a little difference 

about stage4, It has no Patch Merging, Only Swin Transformer Blocks
In the stage Layers that need to be stacked Swin Transformer Blocks The number of times 
23.
drop_rate: Connect directly to patchEmbed Back
attn_drop: The stage What is used
dpr: The stage Different from Swin Traansformer Blocks What is used 
24.
Before building 3 individual stage Yes, there is PatchMerging, But the last one stage It's not PatchMerging
ad locum self.num_layers=4
25.

26.

In the following Patch Merging Of the characteristic matrix passed in shape yes x: B , H*W , C In this way 


28.
When x Of h and w No 2 The integral times of , Need padding, Need to be on the right padding A column of 0, Hereunder padding a line 0.

29.
30.
31.
For the classification model, you need to add the following code 
32.
Initialize the weight of the whole model 
33.
Conduct 4 Double down sampling 
34.
L:H*W
35.
Lose in a certain proportion 
36.
Go through each stage
The following code is to build Swin-T Model 

Swin_T stay imagenet-1k Pre training weights on 
39.
Swin_B(window7_224)

Swin_B(window7_384)


41.
The original characteristic matrix needs to be moved to the right and down , But the specific distance to the right and down is equal to : The size of the window divided by 2, Round down again 

self.shift_size Is the distance to the right and down 
42.
One stage Medium Swin Trasformer Blocks The number of 
When shift_size=0: Use W-MSA
When shift_size It's not equal to 0: Use SW-MSA,shift_size = self.shift_size
44.
Swin Transformer Blocks It could be W-MSA It could be SW-MSA. It is not used to W-MSA and SW-MSA Think of it as a Swin Transformer Blocks
45.
depth Represents the numbers circled in the figure below 

46.
The down sampling here uses Patch Merging Realized 
47.
This is really SW-MSA The use of 
48.
Swin Transformer Blocks Will not change the height and width of the characteristic matrix , So every one SW-MSA It's all the same , therefore attn_mask The size of will not change , therefore attn_mask Just create it once 
49.
50.
This is the end of creating a stage All of Swin Transformer Blocks
51.
This is down sampling Patch Merging, The down sampling height and width are reduced by two times 
52.
there +1, Is to prevent incoming H and W Is odd , So you need padding

53.


54.

55.
56.


58.
59.
The next two x Of shape It's the same 

60.
边栏推荐
- pbootcms的if判断失效直接显示标签怎么回事?
- The C programming language (2nd Edition) notes / 7 input and output / 7.8 other functions
- acwing 801. Number of 1 in binary (bit operation)
- CAS optimistic lock
- QCustomplot笔记(一)之QCustomplot添加数据以及曲线
- 33-【go】Golang sync. Usage of waitgroup - ensure that the go process is completed before the main process exits
- CVPR 2022 | 元学习在图像回归任务的表现
- 博士申请 | 新加坡国立大学Xinchao Wang老师招收图神经网络方向博士/博后
- Overview of webrtc's audio network Countermeasures
- 【BSP视频教程】BSP视频教程第17期:单片机bootloader专题,启动,跳转配置和调试下载的各种用法(2022-06-10)
猜你喜欢

Qcustomplot notes (I): qcustomplot adding data and curves

Overview of webrtc's audio network Countermeasures

博士申请 | 新加坡国立大学Xinchao Wang老师招收图神经网络方向博士/博后

收藏 | 22个短视频学习Adobe Illustrator论文图形编辑和排版

Which colleges are particularly easy to enter?

What's the matter with pbootcms' if judgment failure and direct display of labels?

Anyone who watches "Meng Hua Lu" should try this Tiktok effect

How to play the map with key as assertion

How to base on CCS_ V11 new tms320f28035 project

【研究】英文论文阅读——英语poor的研究人员的福利
随机推荐
MongoDB系列之SQL和NoSQL的区别
深入理解 Go Modules 的 go.mod 与 go.sum
<山东大学项目实训>渲染引擎系统(五)
How to play the map with key as assertion
Leetcode 2190. The number that appears most frequently in the array immediately after the key (yes, once)
std::set compare
Comprendre le go des modules go. MOD et go. SUM
Anfulai embedded weekly report no. 268: May 30, 2022 to June 5, 2022
Token and idempotency
Programmers broke the news: 3 job hopping in 4 years, and the salary has tripled! Netizen: the fist is hard
h t fad fdads
Acwing 1927 automatic completion (knowledge points: hash, bisection, sorting)
Mongodb learning and sorting (basic command learning of users, databases, collections and documents)
Learning notes of MySQL series by database and table
Understand go modules' go Mod and go sum
WebRTC 的音频网络对抗概述
canvas 处理图像(上)
《安富莱嵌入式周报》第268期:2022.05.30--2022.06.05
The C programming language (version 2) notes / 8 UNIX system interface / 8.2 low level i/o (read and write)
[MySQL] Cartesian product - multi table query (detailed explanation)