当前位置：网站首页>【ARXIV2204】Vision Transformers for Single Image Dehazing

【ARXIV2204】Vision Transformers for Single Image Dehazing

2022-07-28 05:00:00 【AI frontier theory group @ouc】

Insert picture description here

The paper ：https://arxiv.org/abs/2204.03883
Code ：https://github.com/IDKiro/DehazeFormer

1、 Research motivation

The author puts forward DehazeFormer For image defogging , Inspiration comes from Swin Transformer , The interesting part of the paper is reflection padding and The calculation of attention

2、 The main method

The method framework is shown in the figure below , It's a 5 Stage UNET structure , Convolution block is DehazeFormer block replace .

Insert picture description here

Reflection padding

stay SWIN in , Use shfited window To realize the interaction of information between windows , But the author believes that this operation is not friendly to the image edge region . For classification tasks , The target area is always in the middle of the image , Therefore use shift window There is no problem , But for the image restoration task , Marginal areas are equally important , Such operation is inappropriate . So , The author puts forward reflection padding operation , As shown in the figure below .

Insert picture description here

The input image size is 8x8, In the picture window yes 4x4 Of , So for the edge area replication 2 individual patch, The image size becomes 12x12, In this way, it can become 3x3=9 individual window. Here 9 individual window Local calculation in attention, After the calculation , Put the middle 8x8 Cut out the area of .

The authors also point out that , Such operations will cause the consumption of computing and memory resources .

W-MHSA with parallel convolution

The author believes that due to MHSA The aggregation weight of is dynamic and normalized , The author believes that static 、 Learnable and unconstrained aggregation weights help complement MHSA. So the author is right V Additional convolution is performed . You can also see in the overall architecture diagram of the paper V There is a convolution layer behind , And attention Add the calculation result of .

The experimental part can refer to the author's paper , There is not much here .

原网站

版权声明
本文为[AI frontier theory group @ouc]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/197/202207131733390237.html