当前位置：网站首页>【Attention】Visual Attention Network

【Attention】Visual Attention Network

2022-07-29 06:03:00 【Dull cat】

List of articles

Thesis link ：https://arxiv.org/abs/2202.09741
Code link ：https://github.com/Visual-Attention-Network

One 、 background

From the NLP Of self-attention The mechanism is applied to 2D When it comes to images , There are three questions ：
take 2D The result is seen as 1D Sequence , It's going to lose 2D Spatial structure
High resolution images will lead to great computational complexity
General self-attention Are only able to capture spatial correlation , Ignoring the correlation on the channel
Insert picture description here

Two 、 motivation

In order to solve the above problems , In this paper, we propose a method for visual tasks large kernel attention (LKA), bring self-attention It can adaptively capture long-distance relationships .

3、 ... and 、 Method

LKA Absorbed convolution and self-attention The advantages of —— Including local structure information 、 Long distance dependence 、 Self adaptability

be based on LKA, The author puts forward a new vision backbone——visual attention network（VAN）.

3.1 Large Kernel Attention

Insert picture description here

Pictured 2, A large convolution can be divided into three parts ：

Local convolution in space （depth-wise convolution）
In space long-range convolution（depth-wise dilation convolution）
Convolution on channels （1x1 convolution）

in other words , One $\times K$ The convolution of can be split as follows ：

One $\frac{K}{d} \times \frac{K}{d} $ Of depth-wise dilation convolution, $d$ Is the expansion rate
One $\times (2d-1)$ Of depth-wise convolution
One 1x1 Convolution

Insert picture description here

The split convolution can capture long-distance information , And save computing resources , After getting a long-distance relationship , Be able to estimate the importance of each point , Generate attention map.

Insert picture description here