当前位置:网站首页>【ARXIV2205】Inception Transformer
【ARXIV2205】Inception Transformer
2022-07-28 05:01:00 【AI frontier theory group @ouc】

【ARXIV2205】Inception Transformer
The paper :https://arxiv.org/abs/2205.12956
Code :https://github.com/sail-sg/iFormer
1、 Research motivation
The core idea of this paper is still : hold attention and CNN Combination ( Google's Inception), But the starting point is different . The author gives VIT Of Fourier spectrum , It is found that the energy is basically concentrated in the low-frequency part , For the edges in the image 、 Insufficient extraction of high-frequency information such as corners .( It's easy to understand ,attention Essentially, it is similar to the global weighted sum , There is a loss of local information )

2、 Inception mixer
The main contribution of this paper is to improve attention , A new module is proposed : Inception mixer. The author's idea is very direct , As shown in the figure below , In the existing VIT Add high-frequency branches to the structure !

(1) High frequency branch . It can be seen that , The high-frequency branch comes from the classic Inception( As shown in the figure below ), Among them linear The essence of layer is 1x1 Convolution .

The characteristics of the input are C C C Channels , Divide it into C h C_h Ch and C l C_l Cl Channels , It is used to extract high-frequency and low-frequency features respectively . For high frequency characteristics , Evenly divided into two parts X h 1 X_{h1} Xh1 and X h 2 X_{h2} Xh2( Are all C h / 2 C_h/2 Ch/2 Channels ), Do the following :
Y h 1 = FC ( MaxPool ( X h 1 ) ) Y_{h1}=\text{FC}(\text{MaxPool}(X_{h1})) Yh1=FC(MaxPool(Xh1))
Y h 2 = DwConv ( FC ( X h 2 ) ) Y_{h2}=\text{DwConv}(\text{FC}(X_{h2})) Yh2=DwConv(FC(Xh2))
(2) Low frequency branch . The low frequency branch is traditional MHSA, Because other branches bring extra computation , So this branch goes first average pooling operation , Then enter MHSA Calculate later upsample operation .
Last , The results of high frequency and low frequency are directly spliced together : Y c = Concat ( Y l , Y h 1 , Y h 2 ) Y_c=\text{Concat}(Y_l, Y_{h1}, Y_{h2}) Yc=Concat(Yl,Yh1,Yh2).
Last , Because the direct interpolation in the low-frequency upsampling operation , Cause adjacency token Too smooth and similar , To solve this problem , The author adds one DwConv, Specific for : Y = FC ( Y c + DwConv ( Y c ) ) Y=\text{FC}(Y_c+\text{DwConv}(Y_c)) Y=FC(Yc+DwConv(Yc))
3、 Overall framework

The author adopts the current mainstream 4 Stage transformer framework , To build the small, base, large Three models , The details are shown in the table below . As can be seen from the table , In the shallow stage of the network , high frequency (conv) Account for a large proportion , Low frequency (MHSA) Account for a small proportion . In the deep stage of network , It's the other way around . This is convoluted with the current mainstream Transformer The conclusion of the combination method is basically the same . meanwhile , In the conclusion , The author also acknowledges , The ratio between high frequency and low frequency needs to be determined according to experience , It's this method limitation.

This method has achieved very good performance in image classification tasks . The author also did target detection 、 Experiments on semantic segmentation , For details, please refer to the author's paper .

边栏推荐
- How to quickly locate bugs? How to write test cases?
- What is the reason why the easycvr national standard protocol access equipment is online but the channel is not online?
- Keil Chinese garbled code solution
- RT_ Use of thread mailbox
- Can plastics comply with gb/t 2408 - Determination of flammability
- CPU and memory usage are too high. How to modify RTSP round robin detection parameters to reduce server consumption?
- 在外包公司两年了,感觉快要废了
- RT based_ Distributed wireless temperature monitoring system of thread (I)
- HDU 3592 World Exhibition (differential constraint)
- What tools do software testers need to know?
猜你喜欢

驾驭EVM和XCM的强大功能,SubWallet如何赋能波卡和Moonbeam

吉利AI面试题【杭州多测师】【杭州多测师_王sir】

提升学生群体中的STEAM教育核心素养

动态sql和分页

Configuration experiment of building virtual private network based on MPLS

excel实战应用案例100讲(十一)-Excel插入图片小技巧

Simulink automatically generates STM32 code details

Redis类型

这种动态规划你见过吗——状态机动态规划之股票问题(中)

App test process and test points
随机推荐
Do you know several assertion methods commonly used by JMeter?
Angr(十一)——官方文档(Part2)
Clickhouse填坑记2:Join条件不支持大于、小于等非等式判断
Table image extraction based on traditional intersection method and Tesseract OCR
Redis configuration file explanation / parameter explanation and elimination strategy
MySQL 默认隔离级别是RR,为什么阿里等大厂会改成RC?
What is the reason why the easycvr national standard protocol access equipment is online but the channel is not online?
基于MPLS构建虚拟专网的配置实验
Take out system file upload
猿辅导技术进化论:助力教与学 构想未来学校
Anaconda common instructions
RT based_ Distributed wireless temperature monitoring system based on thread
FreeRTOS learning (I)
吉利AI面试题【杭州多测师】【杭州多测师_王sir】
Simulink automatically generates STM32 code details
RT based_ Distributed wireless temperature monitoring system of thread (I)
After a year of unemployment, I learned to do cross-border e-commerce and earned 520000. Only then did I know that going to work really delayed making money!
POJ 1330 Nearest Common Ancestors (lca)
HDU 1914 the stable marriage problem
Automated test tool playwright (quick start)