当前位置:网站首页>Swin transformer code explanation

Swin transformer code explanation

2022-06-12 16:45:00 QT-Smile

Swin Transformer Code explanation

The subsampling is 4 times , therefore patch_size=4
 Insert picture description here

 Insert picture description here
2.
 Insert picture description here
3.
emded_dim=96 It is in the following picture C, After the first Linear Embedding Number of channels after processing
 Insert picture description here
 Insert picture description here
4.
 Insert picture description here

After the first full connection layer, the number of channels is doubled
 Insert picture description here
 Insert picture description here
6.
stay Muhlti-Head Attention Whether to use qkv——bias, The default here is to use
 Insert picture description here
7.

first drop_rate Connect to PatchEmbed Back
 Insert picture description here
 Insert picture description here
 Insert picture description here

the second drop_rate stay Multi-Head Attention Used in the process
 Insert picture description here
9.
Third drop_rate It means in every Swin Transformer Blocks Used in , It's from 0 Slowly grow to 0.1 Of
 Insert picture description here

 Insert picture description here
10.
patch_norm Default grounding PatchEmbed hinder
 Insert picture description here
11.
Default not to use , Use the words , It just saves memory
 Insert picture description here
12.
Corresponds to each stage in Swin Transformer Blocks The number of
 Insert picture description here
13.
Corresponding stage4 Number of output channels of
 Insert picture description here

 Insert picture description here
14.
PatchEmbed It is to divide the pictures into non overlapping ones patches
 Insert picture description here
15.
PatchEmbed Corresponding to the high in the structure drawing Patch Partition and Linearing Embedding
 Insert picture description here
 Insert picture description here
16.
Patch Partition It is really realized by a convolution
 Insert picture description here
17.
On the right in the width direction pad And at the bottom of the height direction pad
 Insert picture description here
18.
From the dimension 2 Beginning to flatten
 Insert picture description here
19.
Here is the first one mentioned above drop_rate Connect directly to patchEmbed Back
 Insert picture description here
20.
For the used Swin Transformer Blocks Set up a drop_path_rate. from 0 Start all the way to drop_path_rate
 Insert picture description here
21.
Traversal generates each stage, And in the code stage And in the paper stage There's a little difference
 Insert picture description here
 Insert picture description here
about stage4, It has no Patch Merging, Only Swin Transformer Blocks
 Insert picture description here

In the stage Layers that need to be stacked Swin Transformer Blocks The number of times
 Insert picture description here
23.
drop_rate: Connect directly to patchEmbed Back
attn_drop: The stage What is used
dpr: The stage Different from Swin Traansformer Blocks What is used
 Insert picture description here
24.
Before building 3 individual stage Yes, there is PatchMerging, But the last one stage It's not PatchMerging
ad locum self.num_layers=4
 Insert picture description here
25.
 Insert picture description here

 Insert picture description here
26.
 Insert picture description here

 Insert picture description here

In the following Patch Merging Of the characteristic matrix passed in shape yes x: B , H*W , C In this way
 Insert picture description here
 Insert picture description here

 Insert picture description here
28.
When x Of h and w No 2 The integral times of , Need padding, Need to be on the right padding A column of 0, Hereunder padding a line 0.
 Insert picture description here
 Insert picture description here
29.
 Insert picture description here
30.
 Insert picture description here
31.
For the classification model, you need to add the following code
 Insert picture description here
32.

Initialize the weight of the whole model
 Insert picture description here
33.
Conduct 4 Double down sampling
 Insert picture description here
34.
L:H*W
 Insert picture description here
35.
Lose in a certain proportion
 Insert picture description here
36.
Go through each stage
 Insert picture description here

The following code is to build Swin-T Model
 Insert picture description here

 Insert picture description here

Swin_T stay imagenet-1k Pre training weights on
 Insert picture description here
39.

Swin_B(window7_224)
 Insert picture description here
 Insert picture description here

Swin_B(window7_384)

 Insert picture description here

 Insert picture description here
41.
The original characteristic matrix needs to be moved to the right and down , But the specific distance to the right and down is equal to : The size of the window divided by 2, Round down again
 Insert picture description here
 Insert picture description here
self.shift_size Is the distance to the right and down
 Insert picture description here
42.
One stage Medium Swin Trasformer Blocks The number of
 Insert picture description here

When shift_size=0: Use W-MSA
When shift_size It's not equal to 0: Use SW-MSA,shift_size = self.shift_size
 Insert picture description here
44.
Swin Transformer Blocks It could be W-MSA It could be SW-MSA. It is not used to W-MSA and SW-MSA Think of it as a Swin Transformer Blocks
 Insert picture description here
45.
depth Represents the numbers circled in the figure below
 Insert picture description here
 Insert picture description here
46.
The down sampling here uses Patch Merging Realized
 Insert picture description here
47.
This is really SW-MSA The use of
 Insert picture description here
48.
Swin Transformer Blocks Will not change the height and width of the characteristic matrix , So every one SW-MSA It's all the same , therefore attn_mask The size of will not change , therefore attn_mask Just create it once
 Insert picture description here
49.
 Insert picture description here
50.
This is the end of creating a stage All of Swin Transformer Blocks
 Insert picture description here
51.
This is down sampling Patch Merging, The down sampling height and width are reduced by two times
 Insert picture description here
52.
there +1, Is to prevent incoming H and W Is odd , So you need padding
 Insert picture description here  Insert picture description here
53.

 Insert picture description here
 Insert picture description here
54.
 Insert picture description here
 Insert picture description here
55.
 Insert picture description here
56.
 Insert picture description here
 Insert picture description here

 Insert picture description here
58.
 Insert picture description here
59.
The next two x Of shape It's the same
 Insert picture description here

 Insert picture description here
60.

原网站

版权声明
本文为[QT-Smile]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/163/202206121634419138.html