当前位置:网站首页>Rgb-t tracking: [multimodal fusion] visible thermal UAV tracking: a large scale benchmark and new baseline
Rgb-t tracking: [multimodal fusion] visible thermal UAV tracking: a large scale benchmark and new baseline
2022-07-28 09:22:00 【ZZ's big spike】
Catalog
This paper presents a large-scale RGB-T Tracked data sets , Accordingly, a Baseline, On existing data sets GTOT / RGB210 / RGB234 Get the best performance on .
For information about the dataset in this paper, please see this blog RGB-T track ——【 Dataset benchmark 】GTOT / RGBT210 / RGBT234 / VOT-2019-2020 / LasHeR / VTUAV
RGB-T Introduction to tracking algorithm
Usually RGB-T The tracker mainly uses RGB Trackers are similar pipeline, Then focus on designing a two-mode fusion method . The existing fusion methods are mainly divided into : Image fusion 、 Feature fusion 、 Three types of decision fusion .
- 【 Image fusion 】: utilize BackBone The Internet , Learn the picture features of visible light pictures and thermal infrared pictures by sharing weights , And the shared weight learned is equivalent to taking the same information useful for locating the target in the visible light picture and the thermal infrared picture . The drawback of this method is that it requires high alignment of visible light images and thermal infrared images .
- 【 Feature fusion 】: majority Tracker It is a feature of integrating visible light pictures and thermal infrared pictures . There are also two kinds of integration :1. Use one mode as an auxiliary mode to perform refine;2. First, the features of the two modes are directly spliced ( Usually press channel-wise), Then learn a new feature after the interaction of two modes through the deep network . The advantage of this method is high flexibility , The alignment of pictures is not required .
- 【 Decision fusion 】: Each mode outputs the estimation of the target independently , With response map In the form of , Then merge the two modes of decision , Output one final score.
HMFT
This model accommodates the above three fusion methods . The model is as follows , You can see HMFT The framework has two branches :Discriminative bransh Branches and Complementary bransh Branch . Mainly by 3 It consists of three main modules :CIF / DFF / ADF.
- Discriminative bransh Branch :
- Complementary bransh Branch :
Image complementary information fusion 【CIF】
The function of this module is to learn the consistency information related to the target in the two modes .
- Module input : I v I_v Iv and I t I_t It respectively RGB Pictures and Thermal picture .
- The blue part is the network that extracts complementary information 【Comp. Backbone】, namely ResNet50, Share weight , Extract common features . there L d i v L_{div} Ldiv yes KL- Divergent Loss function , The function is to maintain the consistency of these two modes , use KL Divergence constrains the distribution of features . So in training , The objective function of learning is to make these two backbone The characteristics of network output should be as same as possible . It is also equivalent to considering consistent information . The objective function is as follows :

among P v i P_v^i Pvi and P t i P_t^i Pti respectively visible Pictures and thermal Picture in ResNet50 The first i i i Characteristics of the layer . So this is the characteristic of each layer KL Minimize the sum of divergence . - The output is by channel-wise Features stitched together P a ∈ R 2 C ∗ H ∗ W P_a \in \mathbb{R}^{2C*H*W} Pa∈R2C∗H∗W, The original feature dimension is P v / t ∈ R C ∗ H ∗ W P_{v/t} \in \mathbb{R}^{C*H*W} Pv/t∈RC∗H∗W.
Information fusion of discrimination features 【DFF】
The function of this module is to learn different discriminative information in the two modal information .RGB Images can provide powerful appearance information ; Infrared images can provide information about the target contour . So first model the two modes separately , Generate feature re fusion . The specific process is as follows :
Model input :Backbone The network outputs characteristics independently of two modes F v F_v Fv、 F t F_t Ft
Blue box : take F v F_v Fv、 F t F_t Ft Add by corresponding elements (Elem.Sum) Close , After a global average pool (GAP) And full connection layer (FC) Get a global vector d g d_g dg, Contains information about two modes . The formula is as follows : here D v D_v Dv、 D t D_t Dt It's corresponding to F v F_v Fv、 F t F_t Ft, It should be a clerical error .

Orange Box : Use two independent modal exclusive full connection layers ϝ v \digamma_v ϝv、 ϝ t \digamma_t ϝt+softmax Operation generates mode specific channel-wise The weight of w v w_v wv, w t ∈ R C ∗ 1 ∗ 1 w_t\in \mathbb{R}^{C*1*1} wt∈RC∗1∗1.

#pic_center)Red box : Use the calculated weight w v w_v wv, w t w_t wt use channel-wise The way of multiplication and the initial modal characteristics F v F_v Fv、 F t F_t Ft Multiply , Add it up .
Module output : Fused features D a i D_a^i Dai
Adaptive decision fusion 【ADF】
The function of this module is based on CIF、DFF Characteristic graph of branch independent output , Calculate the confidence of these characteristic graphs , Calculate the weight of these characteristic graphs according to the confidence degree, and weight the characteristic graphs , Then generate the final feature map .
- Module input :CIF、DFF Characteristic graph of branch independent output P a P_a Pa and D a D_a Da.
- MAM The function of the module is to obtain the confidence of consistency branch and discriminant branch respectively based on the self attention mechanism M c M_c Mc、 M d M_d Md. The specific operation is : For input features X X X, That's the top P a P_a Pa and D a D_a Da, Through the first 1*1 The convolution of reduces the feature dimension ( In order to reduce the amount of calculation ), after Reshape operation , take X X X Of shape from C × W × H C \times W \times H C×W×H become C × W H C \times WH C×WH, As a feature embedded in the self attention mechanism , obtain H W × C HW \times C HW×C Characteristics of , Right again channel Add and then reshape obtain H × W × 1 H \times W \times 1 H×W×1 Model confidence . The calculation is as follows :

- take M c M_c Mc and M d M_d Md Splice up , Input to a two-tier Encoder-Decoder In the network , Get the respective weights of the modes E c , E d ∈ R H ∗ W E_c, E_d \in \mathbb{R}^{H*W} Ec,Ed∈RH∗W. This weight is right CIF、DFF Response diagram of branch independent output R c R_c Rc and R d R_d Rd do element-wise ride ( Weighting operation ) obtain R F R_F RF.
R F = R d ⊙ E d + R c ⊙ E c R_F=R_d \odot E_d+R_c \odot E_c RF=Rd⊙Ed+Rc⊙Ec
Algorithm flow

For the current tracking image
- Two branches Discriminative branch and Complementary branch Feature fusion method and image information fusion method are used to get the target response map ;
- utilize ADF, For two branches Discriminative branch and Complementary branch The response graph of , Generate final response diagram ;
- utilize DiMP in IoU Prediction module , Take 10 individual proposal, Right again proposal forecast IoU fraction , Take the three with the highest scores proposal Average , Output the final prediction bounding box .
QQQQQ QQ Q
边栏推荐
- 376. 摆动序列【贪心、动态规划------】
- RGB-T追踪——【多模态融合】Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline
- CakePHP 4.4.3 发布,PHP 快速开发框架
- 51单片机存储篇:EEPROM(I2C)
- 正则表达式为十六进制数字?
- Machine learning: self paced and fine tuning
- canvas常用原型方法及绘制图片应用
- Detailed introduction of v-bind instruction
- 【英语考研词汇训练营】Day 15 —— analyst,general,avoid,surveillance,compared
- mysql5.7.38容器里启动keepalived
猜你喜欢

5 运算符、表达式和语句

RGB-T追踪——【多模态融合】Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline

golang升级到1.18.4版本 遇到的问题

IDC脚本文件运行

【SwinTransformer源码阅读二】Window Attention和Shifted Window Attention部分

MDM data quality application description

【英语考研词汇训练营】Day 15 —— analyst,general,avoid,surveillance,compared

Recommend an artifact to get rid of the entanglement of variable names and a method to modify file names in batches

2022高压电工考试模拟100题及模拟考试

2022年起重机司机(限桥式起重机)考试题库及模拟考试
随机推荐
股指期货开户的条件和流程
IntelliJ idea associated database
2022 safety officer-c certificate special operation certificate examination question bank and answers
【JVM】JVM表示浮点数
站在大佬的肩膀上,你可以看的更远
Vs2015 use dumpbin to view the exported function symbols of the library
Promise learning notes
mysql主从架构 ,主库挂掉重启后,从库怎么自动连接主库
Introduction to official account
正则表达式为十六进制数字?
ES6 变量的解构赋值
Principle of line of sight tracking and explanation of the paper
2022 safety officer-b certificate examination simulated 100 questions and answers
2022 high voltage electrician examination simulated 100 questions and simulated examination
51 single chip microcomputer storage: EEPROM (I2C)
Sentinel
[592. Fraction addition and subtraction]
2022年安全员-B证考试模拟100题及答案
一款入门神器TensorFlowPlayground
Design for failure常见的12种设计思想