当前位置:网站首页>Rgb-t tracking: [multimodal fusion] visible thermal UAV tracking: a large scale benchmark and new baseline
Rgb-t tracking: [multimodal fusion] visible thermal UAV tracking: a large scale benchmark and new baseline
2022-07-28 09:22:00 【ZZ's big spike】
Catalog
This paper presents a large-scale RGB-T Tracked data sets , Accordingly, a Baseline, On existing data sets GTOT / RGB210 / RGB234 Get the best performance on .
For information about the dataset in this paper, please see this blog RGB-T track ——【 Dataset benchmark 】GTOT / RGBT210 / RGBT234 / VOT-2019-2020 / LasHeR / VTUAV
RGB-T Introduction to tracking algorithm
Usually RGB-T The tracker mainly uses RGB Trackers are similar pipeline, Then focus on designing a two-mode fusion method . The existing fusion methods are mainly divided into : Image fusion 、 Feature fusion 、 Three types of decision fusion .
- 【 Image fusion 】: utilize BackBone The Internet , Learn the picture features of visible light pictures and thermal infrared pictures by sharing weights , And the shared weight learned is equivalent to taking the same information useful for locating the target in the visible light picture and the thermal infrared picture . The drawback of this method is that it requires high alignment of visible light images and thermal infrared images .
- 【 Feature fusion 】: majority Tracker It is a feature of integrating visible light pictures and thermal infrared pictures . There are also two kinds of integration :1. Use one mode as an auxiliary mode to perform refine;2. First, the features of the two modes are directly spliced ( Usually press channel-wise), Then learn a new feature after the interaction of two modes through the deep network . The advantage of this method is high flexibility , The alignment of pictures is not required .
- 【 Decision fusion 】: Each mode outputs the estimation of the target independently , With response map In the form of , Then merge the two modes of decision , Output one final score.
HMFT
This model accommodates the above three fusion methods . The model is as follows , You can see HMFT The framework has two branches :Discriminative bransh Branches and Complementary bransh Branch . Mainly by 3 It consists of three main modules :CIF / DFF / ADF.
- Discriminative bransh Branch :
- Complementary bransh Branch :
Image complementary information fusion 【CIF】
The function of this module is to learn the consistency information related to the target in the two modes .
- Module input : I v I_v Iv and I t I_t It respectively RGB Pictures and Thermal picture .
- The blue part is the network that extracts complementary information 【Comp. Backbone】, namely ResNet50, Share weight , Extract common features . there L d i v L_{div} Ldiv yes KL- Divergent Loss function , The function is to maintain the consistency of these two modes , use KL Divergence constrains the distribution of features . So in training , The objective function of learning is to make these two backbone The characteristics of network output should be as same as possible . It is also equivalent to considering consistent information . The objective function is as follows :

among P v i P_v^i Pvi and P t i P_t^i Pti respectively visible Pictures and thermal Picture in ResNet50 The first i i i Characteristics of the layer . So this is the characteristic of each layer KL Minimize the sum of divergence . - The output is by channel-wise Features stitched together P a ∈ R 2 C ∗ H ∗ W P_a \in \mathbb{R}^{2C*H*W} Pa∈R2C∗H∗W, The original feature dimension is P v / t ∈ R C ∗ H ∗ W P_{v/t} \in \mathbb{R}^{C*H*W} Pv/t∈RC∗H∗W.
Information fusion of discrimination features 【DFF】
The function of this module is to learn different discriminative information in the two modal information .RGB Images can provide powerful appearance information ; Infrared images can provide information about the target contour . So first model the two modes separately , Generate feature re fusion . The specific process is as follows :
Model input :Backbone The network outputs characteristics independently of two modes F v F_v Fv、 F t F_t Ft
Blue box : take F v F_v Fv、 F t F_t Ft Add by corresponding elements (Elem.Sum) Close , After a global average pool (GAP) And full connection layer (FC) Get a global vector d g d_g dg, Contains information about two modes . The formula is as follows : here D v D_v Dv、 D t D_t Dt It's corresponding to F v F_v Fv、 F t F_t Ft, It should be a clerical error .

Orange Box : Use two independent modal exclusive full connection layers ϝ v \digamma_v ϝv、 ϝ t \digamma_t ϝt+softmax Operation generates mode specific channel-wise The weight of w v w_v wv, w t ∈ R C ∗ 1 ∗ 1 w_t\in \mathbb{R}^{C*1*1} wt∈RC∗1∗1.

#pic_center)Red box : Use the calculated weight w v w_v wv, w t w_t wt use channel-wise The way of multiplication and the initial modal characteristics F v F_v Fv、 F t F_t Ft Multiply , Add it up .
Module output : Fused features D a i D_a^i Dai
Adaptive decision fusion 【ADF】
The function of this module is based on CIF、DFF Characteristic graph of branch independent output , Calculate the confidence of these characteristic graphs , Calculate the weight of these characteristic graphs according to the confidence degree, and weight the characteristic graphs , Then generate the final feature map .
- Module input :CIF、DFF Characteristic graph of branch independent output P a P_a Pa and D a D_a Da.
- MAM The function of the module is to obtain the confidence of consistency branch and discriminant branch respectively based on the self attention mechanism M c M_c Mc、 M d M_d Md. The specific operation is : For input features X X X, That's the top P a P_a Pa and D a D_a Da, Through the first 1*1 The convolution of reduces the feature dimension ( In order to reduce the amount of calculation ), after Reshape operation , take X X X Of shape from C × W × H C \times W \times H C×W×H become C × W H C \times WH C×WH, As a feature embedded in the self attention mechanism , obtain H W × C HW \times C HW×C Characteristics of , Right again channel Add and then reshape obtain H × W × 1 H \times W \times 1 H×W×1 Model confidence . The calculation is as follows :

- take M c M_c Mc and M d M_d Md Splice up , Input to a two-tier Encoder-Decoder In the network , Get the respective weights of the modes E c , E d ∈ R H ∗ W E_c, E_d \in \mathbb{R}^{H*W} Ec,Ed∈RH∗W. This weight is right CIF、DFF Response diagram of branch independent output R c R_c Rc and R d R_d Rd do element-wise ride ( Weighting operation ) obtain R F R_F RF.
R F = R d ⊙ E d + R c ⊙ E c R_F=R_d \odot E_d+R_c \odot E_c RF=Rd⊙Ed+Rc⊙Ec
Algorithm flow

For the current tracking image
- Two branches Discriminative branch and Complementary branch Feature fusion method and image information fusion method are used to get the target response map ;
- utilize ADF, For two branches Discriminative branch and Complementary branch The response graph of , Generate final response diagram ;
- utilize DiMP in IoU Prediction module , Take 10 individual proposal, Right again proposal forecast IoU fraction , Take the three with the highest scores proposal Average , Output the final prediction bounding box .
QQQQQ QQ Q
边栏推荐
- 如何在多线程环境下使用 GBase C API ?
- 信息学奥赛一本通 1617:转圈游戏 | 1875:【13NOIP提高组】转圈游戏 | 洛谷 P1965 [NOIP2013 提高组] 转圈游戏
- 01 tensorflow calculation model (I) - calculation diagram
- sql server 的关键字在哪张系统表?
- ES查询索引字段的分词结果
- 剑指offer
- Magic brace- [group theory] [Burnside lemma] [matrix fast power]
- js数组去重,id相同对某值相加合并
- 【JVM】JVM表示浮点数
- GBase 8a如何使用使用预处理快速插入数据?
猜你喜欢

OpenShift 4 之AMQ Streams(1) - 多个Consumer从Partition接收数据

478-82(56、128、718、129)

IntelliJ IDEA 关联数据库

Detailed introduction of v-bind instruction

正负数值的正则表达式

网络层的IP协议

51单片机存储篇:EEPROM(I2C)

修改虚拟机IP地址

C#简单调用FMU ,进行仿真计算

Recommend an artifact to get rid of the entanglement of variable names and a method to modify file names in batches
随机推荐
正负数值的正则表达式
Principle of line of sight tracking and explanation of the paper
训练一个自己的分类 | 【包教包会,数据都准备好了】
01-TensorFlow计算模型(一)——计算图
Deconstruction assignment of ES6 variables
QT基础练手小程序-简单计算器设计(附带源码,解析)
12 common design ideas of design for failure
Setting of parameter configuration tool for wireless vibrating wire collector
leetcode 452. Minimum Number of Arrows to Burst Balloons 用最少数量的箭引爆气球(中等)
Sword finger offer
2022高压电工考试模拟100题及模拟考试
2022 examination question bank and simulation examination of crane driver (limited to bridge crane)
10. Learn MySQL like clause
7 C控制语句:分支和跳转
一款入门神器TensorFlowPlayground
[advanced drawing of single cell] 07. Display of KEGG enrichment results
golang 协程的实现原理
Recommend an artifact to get rid of the entanglement of variable names and a method to modify file names in batches
Machine learning (11) -- time series analysis
CSV file storage