当前位置:网站首页>Swin Transformer【Backbone】
Swin Transformer【Backbone】
2022-07-26 03:00:00 【It's too simple】
background
Swin Transformer yes ICCV2021 Best paper .
ViT Give Way transformer from NLP Apply directly to CV There are two direct questions : Scale problem ( For example, pedestrians , The size of the car is NLP There is no field ), Sequence problem ( If the basic unit is image pixels , Sequence too large ).16 individual patch( Low resolution ) send ViT It may not be suitable for intensive predictive tasks , Global modeling increases the computational complexity by a factor of two .
Swin Transformer Give Way transformer It can also do hierarchical feature extraction , Make the extracted features have the concept of multi-scale . Calculating self attention in the window reduces the sequence length ( Computational complexity increases linearly with image size , Non square growth ), Moving makes the interaction between two adjacent windows . Semantically similar parts probably appear in adjacent areas , such local The design of is completely adequate ,ViT The overall design of is redundant .
in general , It refers to the combination of window sliding of convolution and the grasp advantage of its own global vision .
Model structure

The initial operation can be compared ViT understand , Suppose the input image [224,224,3], after Patch Partition become involved [56,56,48]( similar ViT, But the patch The size is 4*4), after Linear Embedding formation [56,56,96], Change to [3136*96], Then type it into Block The final treatment is [3136*96]. Then type in Patch Merging( Space size is divided by 2, Multiply the number of channels 2, In order to compare convolutional neural networks, we have such an operation , This operation can be understood as exchanging space for dimension , See the video for reference , Very detailed .) become [28,28,192], Cycle in turn to form the whole Swin Transformer.
Swin Transformer Block

Mentioned earlier ,[56,56,96] After the tensor of input block , It's in 7*7 Self attention in the window of . See the first figure (b) yes Swin Transformer A basic computing unit of ( There are two Block), First calculate self attention in the window , And then shift The back window does self attention .
How to calculate the self attention after sliding window B Stand on the explanation of God Mu .
Reference resources
B Stand Mu Shen Swin Transformer Intensive reading 【 Intensive reading 】
边栏推荐
- Have you ever seen this kind of dynamic programming -- the stock problem of state machine dynamic programming (Part 1)
- How can users create data tables on Web pages and store them in the database
- 中国信通院陈屹力:降本增效是企业云原生应用的最大价值
- 第3章业务功能开发(删除线索)
- What if the test / development programmer gets old? Lingering cruel facts
- AMD64(x86_64)架构abi文档:
- 【方向盘】启动命令和IDEA如何传递:VM参数、命令行参数、系统参数、环境变量参数、main方法参数
- Games101 review: shading, rendering pipelines
- 多线程编程
- Neo4j import CSV data error: neo4j load CSV error: couldn't load the external resource
猜你喜欢

Software testing post: Ali has three sides. Fortunately, he has made full preparations and has been offered

从各大APP年度报告看用户画像——标签,比你更懂你自己
![[reading notes] user portrait methodology and engineering solutions](/img/5e/916853accf3a5af237f7f114855437.jpg)
[reading notes] user portrait methodology and engineering solutions

Swin Transformer【Backbone】

Have you ever seen this kind of dynamic programming -- the stock problem of state machine dynamic programming (Part 1)

Arthas view the source code of the loaded class (JAD)

MySQL tutorial: MySQL database learning classic (from getting started to mastering)

DFS Niuke maze problem

图像识别(六)| 激活函数

Vofa+ serial port debugging assistant
随机推荐
Games101 review: shading, rendering pipelines
微信公众号互助、开白群,小白报团取暖
How to effectively prevent others from wearing the homepage snapshot of the website
Games101 review: rasterization
hello world驱动(二)-初级版
[SQL] CASE表达式
Wechat official account mutual aid, open white groups, and small white newspaper groups to keep warm
Application of shift distance and hypothesis
MySQL教程:MySQL数据库学习宝典(从入门到精通)
Zhimeng prompts you how to solve the problem of setting the field as linkage type
ES6 advanced - inherit parent class attributes with constructors
Annotation development
My friend took 25koffer as soon as he learned automation test. When will my function test end?
Win11大小写提示图标怎么关闭?Win11大小写提示图标的关闭方法
如何用U盘进行装机?
如何根据登录测试的需求设计测试用例?
[translation] announce Vites 13
MySQL tutorial: MySQL database learning classic (from getting started to mastering)
JS get the time composition array of two time periods
Win11隐藏输入法状态栏方法