当前位置:网站首页>VAN(DWConv+DWDilationConv+PWConv)
VAN(DWConv+DWDilationConv+PWConv)
2022-07-28 06:23:00 【A tavern on the mountain】
1. introduction
Initial self attention mechanism (self-attention) My proposal is in NLP field , But because of its ability of global feature extraction , Self attention mechanism soon swept CV field . But for 2D Image ,self-attention There are the following deficiencies :(1) Lay the image 1D The sequence is lost 2D structural information .(2) Square complexity makes high-resolution pictures ( Such as 800*800) It's too expensive to compute .(3) Only the spatial adaptability is extracted , And ignore the channel channel Adaptability of dimensions . therefore , be based on LKA(Large Kernel Attention) A new network architecture is proposed VAN. Although the architecture is relatively simple , Namely DWConv+DWDilationConv+PWConv The superposition , But its performance can match the current SOFT Vision of Transformer as well as CNN Comparable . As a general backbone network , In image recognition 、 object detection 、 Good results have been achieved in downstream tasks such as semantic segmentation and individual segmentation .
2. Implementation method
There are two main ways to realize the attention module , One is Transformer Medium self attention, But for 2D There are three kinds of defects in images ; The second is to use large kernel convolution ( Be similar to MobileNet_V3 Used in SE modular ), But there are too many parameters , Too much computing overhead . Therefore, the idea of decomposing the large core attention module is proposed .
2.1 Big core attention module LKA

Decompose the large convolution kernel into DWConv+DWDilationConv+PWConv In the form of ,
(1)DWConv Extract local detailed features
(2)DWDilationConv Extract a wide range of features
(3)PWConv Extract channel C Dimensional characteristics
After the above three stages , Calculate the weight of the corresponding pixel .
3. The overall architecture



Calculate the parameters :
Ordinary convolution : Convolution kernel H* Convolution kernel W* Enter the number of channels * Number of output channels
DW Convolution : Convolution kernel H* Convolution kernel W * Number of output channels


By the table 5 It can be seen that ,d=3,K=21 Precision basis
This is saturated and the calculation parameters are not too many .
By the table 1 It can be seen that ,VAN All in all Local features 、 A wide range of features 、 Spatial adaptability and adaptability of channel dimension .

VAN from 4 individual stage form . Every stage from LKA Module and feedforward neural network CFF form , Overall architecture and transformer Of self-attention+ Two layers of MLP The architecture of is very similar .( stay ConNeXt Mentioned in ,transformer Good performance is inseparable from the design of its architecture )
And swin-transformer Agreement , Every stage first block Take down sampling down sample, Module image resolution resolution All reduced , And the number of channels channels increase . By controlling the slip step of convolution stride To control the sampling rate . Except for the first one block Lower sampling outside , rest block The input and output characteristic diagram of is consistent .
4. result
4.1 Test details
stay ImageNet Test on , Data enhancement ,random clipping, random horizontal flipping, label-smoothing [59], mixup [102],cutmix [100] and random erasing [105] . Use AdamW Optimizer trained 310 individual epochs,momentum=0.9, weight decay=5 × 10−2. Initial learning rate LR=5 × 10−4, Use the descending cosine cosin Learning rate ,Layer-Scale technology ,batch_size=1024,Exponential moving average (EMA) To improve the training process .
4.2 stay ImageNet The experimental results

By the table 6 You know ,VAN stay top1 The best accuracy is achieved on both sides , This is the structure that the author calls out according to intuition , I believe that after optimizing some details of the architecture parameters , It must be able to improve . because VAN A combination of VIT and CNN, Use attention to synthesize global information , The local details are handled carefully .
Ablation Experiment
DWConv+DWDilationConv+PWConv Be short of one cannot ,
The lack of DWConv, The accuracy is down 0.5%;
The lack of DWDilationConv, The accuracy is down 1.3%,
Lack of attention mechanism , The accuracy is down 1.1%;
The lack of PWConv, The accuracy is down 0.8%.
5. Visual analysis

Use Grad-CAM Make visual discovery , It is superior to SWIN_T as well as ConvNeXt.
6. Future outlook
(1) Just intuitively propose the framework , Architecture parameters can also be optimized .
In this paper, we only demonstrate an intuitive structure. There are a lot of potential improvements such as adopting different kernel size, introducing multi-scale structure and using multi-branch structure.
(2) Large scale self supervised learning and transfer learning .
A combination of ViT And CNN Advantages of Architecture , It can be extracted 2D The characteristics of the structure and the output can be dynamically adjusted according to the input .
we believe VAN can achieve better performance in image self-supervised learning and transfer learning field.
边栏推荐
- AEM testpro K50 and south Guangdong survey
- 监控安装ESXi on Arm的树莓派4b的CPU温度
- Model inversion attacks that exploit confidence information on and basic countermeasures
- TCL and eltcl? Cdnext and CMRL?
- Shuffle Net_ v1-shuffle_ v2
- FLUKE福禄克Aircheck wifi测试仪无法配置文件?---终极解决心得
- ConNeXt
- arduino 读取模拟电压_MQ2气体/烟雾传感器如何工作及其与Arduino接口
- PT 基于Multi Voltage的Physical Aware
- ICC2(一)Preparing the Design
猜你喜欢

clock tree分析实例

Transformer 自注意力机制 及完整代码实现

确保PoE设备成功部署的最佳实践

PyTorch 学习笔记 3 —— DATASETS & DATALOADERS & TRANSFORMS

Agilent安捷伦 E5071测试阻抗、衰减均正常,惟独串扰NG?---修复方案

福禄克DTX-1800其配件DTX-CHA002通道适配器CHANNEL更换RJ45插座小记

A comparative study of backdoor attack and counter sample attack

BERT基于transformer的双向编码器

硬件电路设计学习笔记2--降压电源电路

mixup_ratio
随机推荐
Install visual studio 2019 steps and vs2019 offline installation package on win7
AEM-TESTpro K50和南粤勘察结下的缘分
简述EMD分解、希尔伯特变换、谱方法
EIGamal cryptosystem description
天线效应解决办法
EMC实验实战案例-ESD静电实验
RS232 RS485 RS422 communication learning and notes
硬件电路设计学习笔记1--温升设计
Shuffle Net_ v1-shuffle_ v2
TVS管参数与选型
3、 Openvino practice: image classification
arduino 读取模拟电压_MQ2气体/烟雾传感器如何工作及其与Arduino接口
(PHP graduation project) based on PHP Gansu tourism website management system to obtain
开关电源电路EMI设计在layout过程中注意事项
2、 Openvino brief introduction and construction process
短跳线DSX-8000测试正常,但是DSX-5000测试无长度显示?
t-SNE降维可视化
浅谈FLUKE光缆认证?何为CFP?何为OFP?
ASP.NET 读数据库绑定到 TreeView 递归方式
Create a basic report using MS chart controls