当前位置:网站首页>【Attention】Visual Attention Network
【Attention】Visual Attention Network
2022-07-29 06:03:00 【Dull cat】
List of articles

Thesis link :https://arxiv.org/abs/2202.09741
Code link :https://github.com/Visual-Attention-Network
One 、 background
From the NLP Of self-attention The mechanism is applied to 2D When it comes to images , There are three questions :
take 2D The result is seen as 1D Sequence , It's going to lose 2D Spatial structure
High resolution images will lead to great computational complexity
General self-attention Are only able to capture spatial correlation , Ignoring the correlation on the channel 
Two 、 motivation
In order to solve the above problems , In this paper, we propose a method for visual tasks large kernel attention (LKA), bring self-attention It can adaptively capture long-distance relationships .
3、 ... and 、 Method
LKA Absorbed convolution and self-attention The advantages of —— Including local structure information 、 Long distance dependence 、 Self adaptability
be based on LKA, The author puts forward a new vision backbone——visual attention network(VAN).
3.1 Large Kernel Attention

Pictured 2, A large convolution can be divided into three parts :
- Local convolution in space (depth-wise convolution)
- In space long-range convolution(depth-wise dilation convolution)
- Convolution on channels (1x1 convolution)
in other words , One K × K K \times K K×K The convolution of can be split as follows :
- One $\frac{K}{d} \times \frac{K}{d} $ Of depth-wise dilation convolution, d d d Is the expansion rate
- One ( 2 d − 1 ) × ( 2 d − 1 ) (2d-1) \times (2d-1) (2d−1)×(2d−1) Of depth-wise convolution
- One 1x1 Convolution

The split convolution can capture long-distance information , And save computing resources , After getting a long-distance relationship , Be able to estimate the importance of each point , Generate attention map.

Pictured 3a Shown ,LKA Model as follows :
3.2 VAN
VAN Simple structure , There are four floors , The model structures of different levels are shown in table 2 Shown :
Pair graph 3d Each of the stage, First, down sample the input , Then follow up .
Complexity analysis :

The implementation details are as follows , Use the following three convolutions to achieve something similar 21x21 Convolution of
- 1 individual 5x5 depth-wise convolution
- 1 individual 7x7 depth-wise convolution(d=3)
- 1 individual 1x1 Convolution
Four 、 effect
4.1 classification




4.2 object detection

4.3 Semantic segmentation

边栏推荐
- 【Clustrmaps】访客统计
- [overview] image classification network
- Training log 6 of the project "construction of Shandong University mobile Internet development technology teaching website"
- 并发编程学习笔记 之 原子操作类AtomicReference、AtomicStampedReference详解
- Interesting talk about performance optimization thread pool: is the more threads open, the better?
- "Shandong University mobile Internet development technology teaching website construction" project training log I
- Semaphore (semaphore) for learning notes of concurrent programming
- Use of file upload (2) -- upload to Alibaba cloud OSS file server
- 深入理解MMAP原理,让大厂都爱不释手的技术
- Basic use of array -- traverse the circular array to find the maximum value, minimum value, maximum subscript and minimum subscript of the array
猜你喜欢

Reporting Services- Web Service

Super simple integration HMS ml kit face detection to achieve cute stickers

【网络设计】ConvNeXt:A ConvNet for the 2020s

【ML】机器学习模型之PMML--概述

Huawei 2020 school recruitment written test programming questions read this article is enough (Part 2)

Ribbon learning notes 1

Machine learning makes character recognition easier: kotlin+mvvm+ Huawei ml Kit

Flutter 绘制技巧探索:一起画箭头(技巧拓展)

【语义分割】语义分割综述

FFmpeg创作GIF表情包教程来了!赶紧说声多谢乌蝇哥?
随机推荐
【Attention】Visual Attention Network
IDEA中设置自动build-改动代码,不用重启工程,刷新页面即可
Process management of day02 operation
性能优化之趣谈线程池:线程开的越多就越好吗?
[semantic segmentation] full attention network for semantic segmentation
[database] database course design - vaccination database
【Transformer】TransMix: Attend to Mix for Vision Transformers
Centos7 silently installs Oracle
并发编程学习笔记 之 工具类Semaphore(信号量)
并发编程学习笔记 之 ReentrantLock实现原理的探究
datax安装
Huawei 2020 school recruitment written test programming questions read this article is enough (Part 2)
【Transformer】TransMix: Attend to Mix for Vision Transformers
30 knowledge points that must be mastered in quantitative development [what is level-2 data]
Yum local source production
Analysis on the principle of flow
A preliminary study on fastjason's autotype
Basic use of array -- traverse the circular array to find the maximum value, minimum value, maximum subscript and minimum subscript of the array
Machine learning makes character recognition easier: kotlin+mvvm+ Huawei ml Kit
MySql统计函数COUNT详解