当前位置:网站首页>【Attention】Visual Attention Network
【Attention】Visual Attention Network
2022-07-29 06:03:00 【Dull cat】
List of articles

Thesis link :https://arxiv.org/abs/2202.09741
Code link :https://github.com/Visual-Attention-Network
One 、 background
From the NLP Of self-attention The mechanism is applied to 2D When it comes to images , There are three questions :
take 2D The result is seen as 1D Sequence , It's going to lose 2D Spatial structure
High resolution images will lead to great computational complexity
General self-attention Are only able to capture spatial correlation , Ignoring the correlation on the channel 
Two 、 motivation
In order to solve the above problems , In this paper, we propose a method for visual tasks large kernel attention (LKA), bring self-attention It can adaptively capture long-distance relationships .
3、 ... and 、 Method
LKA Absorbed convolution and self-attention The advantages of —— Including local structure information 、 Long distance dependence 、 Self adaptability
be based on LKA, The author puts forward a new vision backbone——visual attention network(VAN).
3.1 Large Kernel Attention

Pictured 2, A large convolution can be divided into three parts :
- Local convolution in space (depth-wise convolution)
- In space long-range convolution(depth-wise dilation convolution)
- Convolution on channels (1x1 convolution)
in other words , One K × K K \times K K×K The convolution of can be split as follows :
- One $\frac{K}{d} \times \frac{K}{d} $ Of depth-wise dilation convolution, d d d Is the expansion rate
- One ( 2 d − 1 ) × ( 2 d − 1 ) (2d-1) \times (2d-1) (2d−1)×(2d−1) Of depth-wise convolution
- One 1x1 Convolution

The split convolution can capture long-distance information , And save computing resources , After getting a long-distance relationship , Be able to estimate the importance of each point , Generate attention map.

Pictured 3a Shown ,LKA Model as follows :
3.2 VAN
VAN Simple structure , There are four floors , The model structures of different levels are shown in table 2 Shown :
Pair graph 3d Each of the stage, First, down sample the input , Then follow up .
Complexity analysis :

The implementation details are as follows , Use the following three convolutions to achieve something similar 21x21 Convolution of
- 1 individual 5x5 depth-wise convolution
- 1 individual 7x7 depth-wise convolution(d=3)
- 1 individual 1x1 Convolution
Four 、 effect
4.1 classification




4.2 object detection

4.3 Semantic segmentation

边栏推荐
- 数组的基础使用--遍历循环数组求出数组最大值,最小值以及最大值下标,最小值下标
- Flutter 绘制技巧探索:一起画箭头(技巧拓展)
- 【DL】搭建卷积神经网络用于回归预测(数据+代码详细教程)
- Spring, summer, autumn and winter with Miss Zhang (3)
- [ml] PMML of machine learning model -- Overview
- 【Transformer】ACMix:On the Integration of Self-Attention and Convolution
- MySql统计函数COUNT详解
- 并发编程学习笔记 之 工具类Semaphore(信号量)
- Markdown syntax
- Ribbon learning notes II
猜你喜欢

File文件上传的使用(2)--上传到阿里云Oss文件服务器

Operation commands in anaconda, such as removing old environment, adding new environment, viewing environment, installing library, cleaning cache, etc

Android Studio 实现登录注册-源代码 (连接MySql数据库)

Detailed explanation of MySQL statistical function count

day02作业之进程管理

Huawei 2020 school recruitment written test programming questions read this article is enough (Part 2)

并发编程学习笔记 之 ReentrantLock实现原理的探究

研究生新生培训第一周:深度学习和pytorch基础

微信小程序源码获取(附工具的下载)
![[go] use of defer](/img/10/9e4e1c593870450c381a154f31ebef.png)
[go] use of defer
随机推荐
【Transformer】TransMix: Attend to Mix for Vision Transformers
Interesting talk about performance optimization thread pool: is the more threads open, the better?
Realize the scheduled backup of MySQL database in Linux environment through simple script (mysqldump command backup)
【Transformer】AdaViT: Adaptive Tokens for Efficient Vision Transformer
My ideal job, the absolute freedom of coder farmers is the most important - the pursuit of entrepreneurship in the future
【bug】XLRDError: Excel xlsx file; not supported
Detailed explanation of atomic operation classes atomicreference and atomicstampedreference in learning notes of concurrent programming
第三周周报 ResNet+ResNext
Ribbon learning notes 1
[ml] PMML of machine learning model -- Overview
The third week of postgraduate freshman training: resnet+resnext
Valuable blog and personal experience collection (continuous update)
【Transformer】ATS: Adaptive Token Sampling For Efficient Vision Transformers
Flink connector Oracle CDC synchronizes data to MySQL in real time (oracle19c)
Spring, summer, autumn and winter with Miss Zhang (2)
Reporting Services- Web Service
[pycharm] pycharm remote connection server
A preliminary study on fastjason's autotype
Detailed explanation of atomic operation class atomicinteger in learning notes of concurrent programming
File文件上传的使用(2)--上传到阿里云Oss文件服务器