当前位置:网站首页>【ARXIV2203】CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers
【ARXIV2203】CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers
2022-07-28 05:00:00 【AI frontier theory group @ouc】

1、 Research motivation
The current semantic segmentation mainly uses RGB Images , Add multi-source information as an aid (depth, Thermal etc. ) It can effectively improve the accuracy of semantic segmentation , That is, the fusion of multimodal information can effectively improve the accuracy . The current methods mainly include two :
- Input fusion: Here's the picture a Shown , take RGB and D Data spliced together , Use a network to extract features .
- Feature fusion: Here's the picture b Shown , Two networks will be used to extract RGB and D Characteristics of , Then, feature interactive fusion is carried out in the middle of the network .
Proposed by the author CMX, Characterized by :comprehensive interactions are considered, including channel and spatial-wise cross-modal feature rectification from the feature map, as well as cross-attention from the sequence-to-sequence perspective.

2、 The main method
CMX The main box frame of is shown in the figure below , Use two parallel backbones from RGB and X Extract features from modal input , Intermediate input CM-FRM (cross-modal feature rectification module) Make feature correction , The modified feature continues to be transferred to the next layer . Besides , The features of the same layer are also input FFM(feature fusion module) The fusion . The following will be a detailed introduction CM-FRM and FFM.

CM-FRM: cross-modal feature rectification module The structure is shown in the following figure , The size of the two input features is CHW, Then use... Separately average pooling and max pooling Pool into 1x1xC Dimension vector , Spliced as 1x1x4C, adopt MLP and sigmoid, Correct the characteristics of the upper and lower branches respectively . Subsequently, a spatial level attention calculation is performed on the features , But here the attention calculation is done “ cross ”. Final output , The following form is adopted : X o u t = X i n + λ C X r e c C + λ S X r e c S X_{out}=X_{in}+\lambda_CX^{C}_{rec}+\lambda_SX^{S}_{rec} Xout=Xin+λCXrecC+λSXrecS . Two superparameters are used in the fusion , The experimental median values are 0.5.

FFM:feature fusion module The structure is shown in the following figure , It can be seen that , Is based on Transformer Of . Unlike other methods , Here we treat the two modes equally . But in QKV In calculation , Used 《Efficient Attention: Attention with Linear Complexities》 Where is the method , Can reduce the attention Amount of computation . stay FFN part , Adopted Depth-wise conv replace MLP, meanwhile , The residual connection adds a 1x1 Convolution can further improve the effect .

The experimental part can refer to the author's paper , There's no more talk about .
边栏推荐
- C语言ATM自动取款机系统项目的设计与开发
- 欧拉路/欧拉回路
- RT_ Use of thread mailbox
- Inspire domestic students to learn robot programming education for children
- Analysis of the reason why easycvr service can't be started and tips for dealing with easy disk space filling
- Observable time series data downsampling practice in Prometheus
- Configuration experiment of building virtual private network based on MPLS
- 机器人教育在STEM课程中的设计研究
- POJ 3417 network (lca+ differential on tree)
- Service object creation and use
猜你喜欢

C语言ATM自动取款机系统项目的设计与开发
![[每日一氵]上古年代的 Visual Studio2015 安装](/img/b1/066ed0b9e93b8f378c89ee974163e5.png)
[每日一氵]上古年代的 Visual Studio2015 安装
![(manual) [sqli labs27, 27a] error echo, Boolean blind injection, filtered injection](/img/72/d3e46a820796a48b458cd2d0a18f8f.png)
(manual) [sqli labs27, 27a] error echo, Boolean blind injection, filtered injection

Real intelligence has been certified by two of the world's top market research institutions and has entered the global camp of excellence

(克隆虚拟机步骤)

Wang Shuang assembly language detailed learning notes 3: registers (memory access)

Testcafe provides automatic waiting mechanism and live operation mode

Domain name (subdomain name) collection method of Web penetration

linux下安装mysql

Simulink automatically generates STM32 code details
随机推荐
解析智能扫地机器人中蕴含的情感元素
RT based_ Distributed wireless temperature monitoring system of thread (I)
Configuration experiment of building virtual private network based on MPLS
机器人教育在STEM课程中的设计研究
05.01 string
Comprehensively analyze the differences between steam and maker Education
excel实战应用案例100讲(十一)-Excel插入图片小技巧
Dynamic SQL and paging
Gan: generative advantageous nets -- paper analysis and the mathematical concepts behind it
MySQL(5)
[paper notes] - low illumination image enhancement - zeroshot - rrdnet Network - 2020-icme
list indices must be integers or slices, not tuple
Array or object, date operation
Observable time series data downsampling practice in Prometheus
Youxuan database participated in the compilation of the Research Report on database development (2022) of the China Academy of communications and communications
String 0123456789abcdef, what is the number of substrings (not empty and not the same string itself) [Hangzhou multi tester] [Hangzhou multi tester _ Wang Sir]
The first artificial intelligence security competition starts. Three competition questions are waiting for you to fight
欧拉路/欧拉回路
FPGA: use PWM wave to control LED brightness
The difference between alter and confirm, prompt