当前位置:网站首页>[target tracking] |atom
[target tracking] |atom
2022-07-08 01:47:00 【rrr2】
Article title :《ATOM: Accurate Tracking by Overlap Maximization》
Article address :https://arxiv.org/pdf/1811.07628.pdf
github Address :https://github.com/visionml/pytracking
CVPR2019 oral
problem
People focus on developing powerful classifiers , However, the accurate target state estimation is seriously ignored (target state estimation)( That is, the regression problem of bounding box ).
In the actual , Many classifiers use simple multi-scale search methods ( for example SiamFC) To estimate the bounding box of the target .
chart 1. Compare our method with the most advanced tracker . Based on correlation filter UPDT[3] Lack of explicit target state estimation components , It is Perform brute force multiscale search . therefore , it Do not deal with aspect ratio changes , This may cause the trace to fail ( The second line ).DaSiamRPN[42] The bounding box regression strategy is used to estimate the target state , But out of plane rotate 、 deformation Under such circumstances, there are still difficulties . Our method uses overlapping prediction Networks , Successfully addressed these challenges , And provide accurate bounding box prediction
We believe that this method is essentially limited , Because target estimation is a very complicated thing , Need high-level information about the goal (high-level knowledge).
Tracking tasks can be broken down into categories (classification) Task and an estimate (estimation ) Mission . For the former , By dividing the image area into foreground and background , So as to robustly provide rough positioning of the target .
The latter is to estimate the target state , It is usually represented by a bounding box .
Online learning is to obtain the classifier weight in the first frame . Capture the characteristics of a specific target .
In this paper, we do
(1) Prediction of bounding box
In this paper , We balance the performance gap between target classification and target estimation , We introduced a novel tracking architecture , It consists of two parts: target estimation and target classification .
Received recent IoU-Net Inspired by the , We train a target estimation component , To predict “ The goal is ” and “ Estimated bounding box ” Between IoU. Because of the original IoU-Net Is a specific category (class-specific Of ), So it is not very suitable for general target tracking , We propose a novel architecture to put target-specific Information into IoU In the prediction of . We used to introduce a modulation based (modulation-based) Network components , hold reference Image ( That is to say Templates ) Target appearance information Combine , In order to obtain target-specific Of IoU It is estimated that . This allows us to train the target estimation component off-line on large-scale data sets . In the process of tracking , By putting the predicted IoU overlap Maximize , To find the target bounding box .
(2) Online training classifier ( Online training with conjugate gradient )
In order to develop a seamless and transparent tracking method , We also revisit the problem of target classification , To avoid unnecessary complexity . Our target classification component is simple and powerful , from 2 22 A network header with full connection layer . Different from the target estimation module , The classification component is trained online , Provide strong robustness in disturbed scenes . To ensure real-time performance , We solved the problem of online optimization , The problem of gradient descent cannot be effectively . contrary , The strategy based on conjugate gradient is adopted , It also demonstrates how to implement it in a deep learning architecture . The tracking process is simple , It mainly includes classification 、 It is estimated that 、 Model update .
Related work
Based on correlation filtering (correlation-based) The tracker of has been widely used . But it has been impossible to accurately estimate the target ( Bounding box ). Even to find a single parameter scaling factor , It's also a huge challenge . Most methods adopt the strategy of brute force multi-scale detection , The computational burden is heavy . therefore , The default method is to use a separate classifier to do state estimation . But the target classifier is not sensitive to all aspects of the target state , For example, width and height . actually , The target state is invariant in some ways , You can consider Use this feature to improve the robustness of the model . We don't rely on classifiers , Instead, learn a special goal estimation component .
Tracking should separate classification from estimation , Because classification is mainly used to judge whether a location target exists , And not sensitive to the state of the target , The target state is simplified as 2D Position and the length and width of the target box , This is what the goal evaluation does , Therefore, dividing the tracking framework into two task modules helps to improve the overall performance .
be based on CF Our tracker is a good classifier , It will output a response graph , Determine the most likely location of the target according to the maximum response , But this method can not completely estimate the state of the target , Such as scale , So scale estimation usually uses additional classifiers .
The accurate estimation of the target requires a lot of prior information , Because target changes such as deformation are difficult to estimate by tracking image information alone .
So the author thinks SiameseRPN The success of depends mainly on a lot of offline training , however Siamese Most methods are limited by the performance of classification , Because this kind of method has no online training process , and CF There is a way , So there is no online training
As a result, it cannot deal with the interference in tracking well , Or similar goals , Model updating can only partially solve this problem .
For the accurate estimation of the target bounding box , It's a complex task , Advanced prior knowledge is required . The bounding box depends on Attitude and perspective of the target , This cannot be modeled as a simple image transformation ( For example, unified image scaling ). therefore , Learning online from scratch to accurately estimate goals is very challenging . There are some methods recently , Integrate prior knowledge through a lot of offline training . for example SiamRPN, And its extended algorithm , All show the ability of bounding box regression .
However, these twins (siamese) Tracking methods usually struggle with the problem of target classification . Unlike those based on Correlation (correlation-based) Methods , Tracking due to No online updates , Interference factors are not explicitly considered . Although some use simple template update technology , Improved a little , but It has not reached the level of a powerful online learning model . Therefore, the author proposes an online training classifier and an offline training evaluation network , Work together to solve the problem of target tracking , In fact, it is very similar to testing , It is a two-stage tracking framework .
Compared with the twin tracking method , We Online learning classification model , At the same time, a lot of off-line training is used to estimate the target .
Target classification and target estimation Tasks share the same backbone network , This backbone is in ImageNet Pre trained ResNet-18, Then fine tune here .
Different from the detection task ,IoU Modules need to be trained for goals rather than categories , The author thinks that IoU The prediction module is a high-level semantic module , Therefore, it is unreasonable to use only the first frame of the video for training , So we need to train offline to get a person with strong generalization ability IoU Prediction module .
The difficulty lies in IoU How does the module make good use of the information of the reference frame of the given target , The author has made many attempts , Find out Siamese Its structure is good , So a similar network called modulation based network is used to Predict any target of a given reference frame Of IoU.
There are two parts , The upper half uses the reference frame to generate a modulation vector to modulate the network of the lower half test frame . The input characteristic networks of the two branches are consistent . The first half is the reference frame x 0 Reference target B 0 Characteristics of , Output a positive number D Modulation vector of dimension c,D Corresponding characteristic layers .
And in the test frame x when , The network part has changed , The proposed feature of the backbone network is followed by an additional convolution layer , Corresponding back pooling It's getting bigger , Then, each channel of the feature is weighted with the modulation vector , That is, the information of the reference frame is given , The modulated features are then sent to IoU Prediction module g, That is, output after three full connection layers IoU, Then the formula is right for a bb,B Of IoU namely I o U ( B ) = g ( c ( x 0 , B 0 ) ⋅ z ( x , B ) )
Training
The author used LaSOT and trackingNet Image pairs collected on the dataset , Also used. coco Data set for data amplification , Adopted and DaSiamese Similar approach
On the reference frame , Extracted around the target 5 Times the size of the square area as input , The test frame simulates the tracking scene by perturbing the position and scale of the image , Each picture pair generates 16 Candidates bb, This is from gt Plus Gaussian noise , And set bb and gt The smallest IoU by 0.1, Use image blur and color perturbation for data amplification ,IoU Finally normalized to -1 and 1 Between . The backbone network is not updated during training .
Classification of network
It consists of two convolution layers , It is mainly used for rough positioning, so there is no need for a deeper network , In fact, classification network is more like regression , Similar to the idea of deep regression tracking , A Gaussian centered on the target is regressed by convolution label, Write a function, that is
and CF The method is similar to writing the objective function, that is
Online tracking
The evaluation network part estimates the location and scale of the target , First, find the maximum confidence point according to the confidence graph of the classification network , That is, the rough candidate target area , Combine the scale of the previous frame to generate the initial bb, Theoretically, only one bb That's all right. , But more is better , So after adding noise, it generates 9 individual ,10 individual bb All give IoUNet The prediction module utilizes target estimation Modules calculate their IoU The number , For each bb,5 Subgradient descent iteration maximizes IoU Get the best bb, Finally take 3 The highest IoU It's worth it bb As the final prediction result ,
The modulation vector is only calculated in the first frame , That is to say, the reference frame branch is only used in the first frame , Not in the back , The reference frame is the initialization frame .
ref
https://www.bilibili.com/video/BV1H54y167rG?vd_source=74166d3ce4e663703f01426526c56fd1
https://blog.csdn.net/laizi_laizi/article/details/109455080
边栏推荐
猜你喜欢
ANSI / nema- mw- 1000-2020 magnetic iron wire standard Latest original
如何让导电滑环信号更好
Urban land use distribution data / urban functional zoning distribution data / urban POI points of interest / vegetation type distribution
LeetCode 练习——剑指 Offer 36. 二叉搜索树与双向链表
Usage of hydraulic rotary joint
从Starfish OS持续对SFO的通缩消耗,长远看SFO的价值
About snake equation (5)
快速熟知XML解析
Remote Sensing投稿经验分享
滑环在直驱电机转子的应用领域
随机推荐
Euler Lagrange equation
Understanding of maximum likelihood estimation
跨模态语义关联对齐检索-图像文本匹配(Image-Text Matching)
qt-使用自带的应用框架建立--hello world--使用min GW 32bit
第七章 行为级建模
保姆级教程:Azkaban执行jar包(带测试样例及结果)
【目标跟踪】|atom
MySQL查询为什么没走索引?这篇文章带你全面解析
如何制作企业招聘二维码?
Kafka connect synchronizes Kafka data to MySQL
Codeforces Round #649 (Div. 2)——A. XXXXX
快速熟知XML解析
AttributeError: ‘str‘ object has no attribute ‘strftime‘
npm 內部拆分模塊
Redis集群
Tapdata 的 2.0 版 ,開源的 Live Data Platform 現已發布
MySQL数据库(2)
After modifying the background of jupyter notebook and adding jupyterthemes, enter 'JT -l' and the error 'JT' is not an internal or external command, nor a runnable program
从Starfish OS持续对SFO的通缩消耗,长远看SFO的价值
用户之声 | 冬去春来,静待花开 ——浅谈GBase 8a学习感悟