当前位置:网站首页>【ARXIV2203】CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers
【ARXIV2203】CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers
2022-07-28 05:00:00 【AI frontier theory group @ouc】

1、 Research motivation
The current semantic segmentation mainly uses RGB Images , Add multi-source information as an aid (depth, Thermal etc. ) It can effectively improve the accuracy of semantic segmentation , That is, the fusion of multimodal information can effectively improve the accuracy . The current methods mainly include two :
- Input fusion: Here's the picture a Shown , take RGB and D Data spliced together , Use a network to extract features .
- Feature fusion: Here's the picture b Shown , Two networks will be used to extract RGB and D Characteristics of , Then, feature interactive fusion is carried out in the middle of the network .
Proposed by the author CMX, Characterized by :comprehensive interactions are considered, including channel and spatial-wise cross-modal feature rectification from the feature map, as well as cross-attention from the sequence-to-sequence perspective.

2、 The main method
CMX The main box frame of is shown in the figure below , Use two parallel backbones from RGB and X Extract features from modal input , Intermediate input CM-FRM (cross-modal feature rectification module) Make feature correction , The modified feature continues to be transferred to the next layer . Besides , The features of the same layer are also input FFM(feature fusion module) The fusion . The following will be a detailed introduction CM-FRM and FFM.

CM-FRM: cross-modal feature rectification module The structure is shown in the following figure , The size of the two input features is CHW, Then use... Separately average pooling and max pooling Pool into 1x1xC Dimension vector , Spliced as 1x1x4C, adopt MLP and sigmoid, Correct the characteristics of the upper and lower branches respectively . Subsequently, a spatial level attention calculation is performed on the features , But here the attention calculation is done “ cross ”. Final output , The following form is adopted : X o u t = X i n + λ C X r e c C + λ S X r e c S X_{out}=X_{in}+\lambda_CX^{C}_{rec}+\lambda_SX^{S}_{rec} Xout=Xin+λCXrecC+λSXrecS . Two superparameters are used in the fusion , The experimental median values are 0.5.

FFM:feature fusion module The structure is shown in the following figure , It can be seen that , Is based on Transformer Of . Unlike other methods , Here we treat the two modes equally . But in QKV In calculation , Used 《Efficient Attention: Attention with Linear Complexities》 Where is the method , Can reduce the attention Amount of computation . stay FFN part , Adopted Depth-wise conv replace MLP, meanwhile , The residual connection adds a 1x1 Convolution can further improve the effect .

The experimental part can refer to the author's paper , There's no more talk about .
边栏推荐
- FPGA: use PWM wave to control LED brightness
- [Oracle] 083 wrong question set
- [learning record] data enhancement 1
- Installing MySQL under Linux
- Handling of web page image loading errors
- Youxuan database participated in the compilation of the Research Report on database development (2022) of the China Academy of communications and communications
- 字符串0123456789abcdef,子串(非空且非同串本身)的个数是多少【杭州多测师】【杭州多测师_王sir】...
- Clickhouse pit filling note 2: the join condition does not support non equal judgments such as greater than and less than
- Activation functions sigmoid, tanh, relu in convolutional neural networks
- go-zero单体服务使用泛型简化注册Handler路由
猜你喜欢

Online sql to XML tool

RT based_ Distributed wireless temperature monitoring system based on thread

excel实战应用案例100讲(十一)-Excel插入图片小技巧

C语言ATM自动取款机系统项目的设计与开发

Inspire domestic students to learn robot programming education for children
![[函数文档] torch.histc 与 paddle.histogram 与 numpy.histogram](/img/ee/ea918f79dc659369fde5394b333226.png)
[函数文档] torch.histc 与 paddle.histogram 与 numpy.histogram

Know etcd

UI automation test farewell from now on, manual download browser driver, recommended collection

Paper reading notes -- crop yield prediction using deep neural networks

Activation functions sigmoid, tanh, relu in convolutional neural networks
随机推荐
HDU 1522 marriage is stable
Testcafe's positioning, operation of page elements, and verification of execution results
[high CPU consumption] software_ reporter_ tool.exe
Euler road / Euler circuit
FreeRTOS learning (I)
What SaaS architecture design do you need to know?
(clone virtual machine steps)
Geely AI interview question [Hangzhou multi tester] [Hangzhou multi tester _ Wang Sir]
MySQL(5)
Angr (XI) - official document (Part2)
Flink mind map
How to send and receive reports through outlook in FastReport VCL?
(3.1) [Trojan horse synthesis technology]
[function document] torch Histc and paddle Histogram and numpy.histogram
Check box error
提升学生群体中的STEAM教育核心素养
Domain name (subdomain name) collection method of Web penetration
Chuangyuan will join hands with 50+ cloud native enterprises to explore new models to cross the digital divide
Program life | how to switch to software testing? (software testing learning roadmap attached)
使用nfpm制作rpm包