当前位置:网站首页>LEARNING TARGET-ORIENTED DUAL ATTENTION FOR ROBUST RGB-T TRACKING
LEARNING TARGET-ORIENTED DUAL ATTENTION FOR ROBUST RGB-T TRACKING
2022-06-11 06:50:00 【A Xuan is going to graduate~】
Rui Yang, Yabin Zhu, Xiao Wang, Chenglong Li, Jin Tang
Hefei, Anhui Province, China
2019 IEEE International Conference on Image Processing (ICIP)
1. Abstract
RGBT Tracking attempts to locate targets using complementary visual and thermal infrared data . The existing RGBT Tracker pass Robust feature representation learning or Adaptive modal weighting To fuse different modes . However , How to integrate the dual attention mechanism for visual tracking is still a subject that has not been studied . In this paper , Two visual attention mechanisms are proposed for robust visual tracking . say concretely , Partial attention through the use of RGB and T The common attention of data is used to train the depth classifier . It also introduces the global attention , This is a multimodal goal driven attention estimation network . It can provide the classifier with global suggestions and local suggestions extracted from previous tracking results .
2. introduction :
In this paper, a new dual vision attention guided RGBT Tracking algorithm : Local attention and global attention . The training process consists of two steps forward and backward . In the forward step , Will be paired RGB and T Samples are sent to the depth tracking detection network , Estimate the corresponding classification score . In the backward step , Along the direction from the last fully connected layer to the first convolution layer , Pairs of inputs RGB-T The samples were partially verified by classification scores . take The partial derivative of the first layer is output as RGB And heat input . Each pixel value on this attention map indicates input RGB-T The importance of the corresponding pixels of the sample to the classification accuracy . In the process , In the loss function, the attention graph is added as the regularization term , Make the classifier pay more attention to the target area .
Local search strategy
This article will The paper 1 A target driven attention estimation network first proposed in , Extended to RGB-T On the global attention mechanism , To deal with the problems caused by local search strategy . say concretely , take RGB、T And the original target image as input , The characteristic graphs extracted from the convolution network are connected , These features are fed into the up sampling network , To generate the corresponding attention map . High quality global recommendations (global proposal) From the attention area (attention region) Extract from , And send it to the classifier together with local suggestions . therefore , The complementarity of local and global attention maps will be further improved RGB-T Robustness and accuracy of target tracker .
Contribution of this paper :
(1) Propose a Use visual attention Of Local attention mechanism , be used for RGB-T track .
(2) To further improve RGB-T Robustness of target tracker , The goal driven global attention mechanism is extended to multimodal form .
3. Method :
3.1 Network structure :

The network mainly includes two modules : be based on RGBT Tracking local attention and multimodal driving global attention estimation network .
3.1.1 Local attention network :
The general tracking detection framework usually defines the target object as a positive class , The background is defined as a negative class to train a binary classifier , for example MDNet. In this paper MDNet As RGBT The core of the tracker , Because it has a strong feature representation ability . say concretely , For the input RGB and T The sample pair , Three convolution layers and two fully connected layers are used to extract features , To reduce the computational burden , The features of different modes are connected and sent to the domain specific layer to obtain the fractional graph . Cross entropy loss is used to optimize :

N yes mini-batch size,yi It's No i Yes RGBT Sample to truth labels .Pi It's corresponding to RGBT Prediction of sample pairs . In order to make the classifier pay more attention to the target in the tracking process , stay MDNet A regularization term based on cross entropy function is added to the , The motivation for joining this item is , We can get two attention maps for input pairs , namely positive attention map Ap and the negative attention map An. For each positive sample , Want to be related to the target object Ap Each pixel value of is larger , and An The pixel value of is small . The regularization term is defined as follows :

and
Represent mean and variance respectively .
The final loss function is set to :
![]()
Is a scalar parameter used to balance these two terms , In subsequent experiments , The effects of these two parameters are also examined .
Based on the formula 4, Interactive learning can be achieved through standard back propagation and chain rules . In each iteration of the classification trainer , The attention map of each input training data can be obtained , The classifier will focus more on the target object than the background , In the tracking phase , The classifier will learn to focus on RGB And thermal images .
Although the use of local attention mechanism has achieved better performance , But this improved tracking and detection framework still adopts this local search strategy , It will cause serious obstruction to the , Sensitive to challenges such as field vision and rapid movement . therefore , This article quotes RGB-T Goal driven global attention network to deal with this problem ,
3.1.2 Global attention to the network :
In this section , Put forward RGB-T Goal driven global attention network , To supplement local recommendations for robust visual tracking , As shown in the network diagram : The input to this module is RGB、 Thermal infrared and corresponding target objects , Truncated VGG Network to extract the feature representation of these inputs , And connect them into a characteristic diagram . To be precise , First, input all the images resize become 192x256x3, The corresponding characteristic diagram is 12x16x512, therefore , The characteristic diagram after connection is 12x16x2048, Then it is sent to the upper sampling network , The upsampling network is reverse VGG The Internet , The output has the same resolution as the input .
The paper 1 :Xiao Wang, Chenglong Li, Rui Y ang, Tianzhu Zhang,Jin Tang, and Bin Luo, “Describe and attend to track:Learning natural language guided structural representation and visual attention for object tracking,” arXiv preprint arXiv:1811.10014, 2018.
边栏推荐
- text-overflow失效
- 563. slope of binary tree
- Simple integration of client go gin six list watch two (about the improvement of RS, pod and deployment)
- NPM upgrade: unable to load file c:\users\administrator\appdata\roaming\npm\npm-upgrade ps1
- Flutter 约束容器组件
- Won't virtual DOM be available in 2022? Introduction to virtual Dom and complete implementation of diff and patch
- Handwriting promise [03] - realize multiple calls and chain calls of then method
- 不引入第三个变量,交换两个值
- 微信小程序开发(原生和uniapp)DOM标签对比介绍
- Quick sorting of graphic array [with source code]
猜你喜欢

Redux learning (III) -- using Redux saga, writing middleware functions, and splitting reducer files

Check whether the filing information of the medical representative is correct

563. 二叉树的坡度

538. convert binary search tree to cumulative tree

Mediaextractor source code analysis of multimedia framework analysis (1)

VTK-vtkPlane和vtkCutter使用

无心剑汉英双语诗001.《爱》

Simple integration of client go gin six list watch two (about the improvement of RS, pod and deployment)

100. 相同的树

Starting from scratch (IV) enemy aircraft flying out of the border disappear automatically
随机推荐
核查医药代表备案信息是否正确
How to arrange the dataframe from small to large according to the absolute value of a column?
100. 相同的树
JS implementation of graphic merging and sorting process [source code attached]
617. merge binary tree
Sohu employees encounter wage subsidy fraud. What is the difference between black property and gray property and how to trace the source?
Summary and review
Why is it that the live video of the devices connected to easygbs suddenly cannot be played? Insufficient database read / write
[]==! []
A promise with bare hands
572. 另一个树的子树
538.把二叉搜索树转换成累加树
Error code in ijkplayer
Handwritten promise [01] - Implementation of promise class core logic
Handwriting promise [03] - realize multiple calls and chain calls of then method
Communication between different VLANs
Simple integration of client go gin six list watch two (about the improvement of RS, pod and deployment)
【Matlab WSN通信】A_Star改进LEACH多跳传输协议【含源码 487期】
ijkPlayer中的错误码
Differences between FindIndex and indexof