当前位置:网站首页>Learning Deep Compact Image Representations for Visual Tracking
Learning Deep Compact Image Representations for Visual Tracking
2022-08-05 11:38:00 【the way of code】
摘要
在本文中,We studied the tracking can be very complex background of moving object trajectory of challenging problems in video.With most online learning only tracking object appearance compared to the existing tracker,我们采用不同的方法,Inspired by the latest progress of deep learning architecture,更加强调(无监督)Characteristics of the learning problems.具体来说,Through the use of auxiliary natural images,Our offline training stack denoising automatic encoder,To learn the characteristics of general image more robust to change.Then training from offline to online tracking process of knowledge transfer.Online tracking involves classification neural network,The classification of neural network by training of automatic encoder encoder parts,As the classification and the additional feature extraction layer.Feature extractor and classifier can be further adjusted to adapt to the appearance of a moving object changes.Video sequence with some challenging benchmark of the most advanced tracking comparison shows that,When our trackerMATLABWith a moderate amount of graphics processing when used with,We study the depth of the tracker more accurate,At the same time keep low computation cost and real-time performance of the unit(GPU).
1 引言
视觉跟踪,Also known as the object tracking,Refers to when the object moves automatically estimate the object's trajectory in video.It has many applications in many fields,Including security video surveillance,Human-computer interaction and sports video analysis.Although some applications may need to track multiple moving objects,But a typical setup is to separate each object.In the first video frame manually or automatically recognition to track object after,Visual tracking target is automatic tracking the object in subsequent frames path.Although the existing computer visual technology can be in a good control environment provides a satisfactory solution to the problems,But because such as part of the cover,杂乱的背景,Rapid and sudden movement,Dramatic changes in lighting and big factor,In many practical applications, the problem can be very challenging.The change of views and position.
从学习的角度来看,Visual tracking is challenging,Because it is only in the first video frames in the form of an identified object tag instance.在随后的帧中,The tracker must only use unlabelled data to study changes of tracked object.Since there is no a priori knowledge about the tracking object,The tracker is easy to deviate from the goal.为了解决这个问题,Using a semi-supervised learning has put forward some of the tracker.Another way to first learn from secondary data image characteristics of dictionary(例如SIFT局部描述符),Then will learn the knowledge is passed to the online tracking.
Another problem is that many of the existing tracker using image said may not be enough to carry out robust tracking in the complex environment.Especially for a tracker with discrimination,The tracker is usually more emphasis on improving classifier instead of using the image characteristics.Although many tracker just use original pixels as the characteristic,But some try to use the function of more information,例如Haar功能,Histogram feature and local binary pattern.但是,These features are offline handmade,But not for tracking object tailored.最近,Deep learning architecture has been successfully used in some complex tasks provide very promising results,包括图像分类和语音识别.The key to success is through deep structure through a variety of nonlinear transformation to learn the characteristics of constant richer.我们认为,出于同样的原因,Visual tracking can also benefit from the deep learning.
在本文中,We put forward a new kind of deep learning tracker( deep learning tracker DLT),For a strong visual tracking.We try to develop a strong discriminant tracker to discriminant after tracking through combining with the generation and philosophy,The tracker using automatic learning effective image said. DLTWith other existing tracker has some key features.首先,It USES a stack denoising automatic encoder(stacked denoising autoencoder SDAE)To learn the general image characteristics of large image data set as a secondary data,Then will learn the characteristics of the transmission to the online tracking task.其次,With previous also learn from secondary data characteristics method is different,DLTThe learning characteristics can be further adjusted to adapt to the specific objects in the process of online tracking.因为DLTUsing multiple nonlinear transform,Image gained by the said ratio based onPCAThe previous method more expressive.此外,Due to the said tracking objects don't need to solve the optimization of previous tracker based on sparse coding problem,因此DLTSignificantly more effective,Therefore more suitable for real-time applications.
2 Visual tracking particle filter method
Particle filter method is usually used in visual tracking.从统计角度来看,It is a sequential monte carlo sampling method is important,Used to estimate according to the observation sequence dynamic system state of latent variables.在时间t,Supppse st 和 yt Respectively latent state and observation variables.在数学上,Tracking objects correspond to the observation based on the time step before until each time step to findtThe most likely state of the problem:
When a new watch arrived,State variables of the posterior distribution according to bayes' rule updates:
The specific content of particle filter method is it through a set of n 个样本(称为粒子)Close to the real state of posterior distribution p(st|y1:t),The importance of the corresponding weight sum to1.Particles from the importance of distribution q(st|s1:t-1, y1:t),权重更新如下:
In order to select importance distribution q(st|s1:t-1, y1:t),Usually it is simplified to a first order markov process q(s:t|s:t-1),The state transition is independent of the watch.因此,权重更新为.注意,After each weight update steps,The weight might no longer be equal to the sum of1.If it is less than the threshold,The application to samples drawn from the current particle concentration in proportion ton个粒子,Then the weight of their reset to 1 / n.If the weight and higher than the threshold,The application of linear normalized to make sure the weight sum to1.
For object tracking,状态变量siUsually said six affine transformation parameters,Its corresponding to move,比例,纵横比,Deflection and rotation.特别地,q(st|st-1) Of each dimension by normal distribution independent modeling.对于每个帧,Tracking results just a particle has the greatest weight.Although many tracker also USES the same particle filter method,But the main difference is that the observation model p(yt|sti) 的公式.显然,A good model should be able to distinguish good tracking object and background,At the same time will still be able to resist various types of objects changes.For judging the tracker,The formula is usually used to set up and the classifier output index correlated to the degree of confidence probability.
The particle filter framework is the main method of visual tracking,原因有几个.首先,It goes beyond the gaussian distribution,More general than kalman filter methods.此外,It through a set of particles rather than just a point(例如模式)To approximate state distribution after.For visual tracking,This property makes the tracker can be more easily to recover from errors of tracking results.Tutorial on the use of particle filter in visual tracking can be found in the.最近的一些工作,例如,To improve the framework of particle filter for visual tracking.
3 DLT(deep learning tracker)追踪器
我们现在介绍我们的DLT跟踪器.During the offline training stage,By training with auxiliary image dataSDAETo perform unsupervised feature learning in order to study general natural image characteristics.First application step by step a preliminary training,然后对整个SDAE进行微调.During the process of online tracking,Add the classification of the additional layer to the training ofSDAEPart of the encoder to produce classification neural network.This section provides more detailed information on the rest.
3.1 Secondary data offline training
3.1.1 数据集和预处理
我们使用Tiny ImagesAs secondary data offline training data sets.Through to the seven search engine to provide the English abstract nouns,Collect data set from the Internet,Covers many objects found in the real world and the scene.From each size as32×32的近8000Than a tiny image,我们随机抽样100All images offline training.Because of our experience is contained in the most the most advanced tracker using only gray image,So we will all the sample image into a gray(But our method can directly use color images when necessary).因此,Each image to3232的图像转为10241的向量,Each dimension characteristic value was normalized(Characteristic value of each dimension linear zoom range[0,1]),But should not be used further preprocessing.
3.1.2 Use stack denoising automatic encoder(SDAE)Learn common image characteristics
SDAEThe basic building blocks is called denoising automatic encoder(DAE)The single layer neural network,It is the latest variant of the traditional automatic encoder.It learned to recover from damage to the version of the data samples.这样做,Studied the robust features,Because neural network includes“瓶颈”,The unit is less than the input unit of hidden layer.我们在图1(a)中展示了 DAE 的架构.
For a total ofk个训练样本.对于第i个样本,让xiSaid the original data samples,x~i是xiThe damage of version,The damage may be blocking damage,Additive gaussian noise and salt and pepper noise.For the network weights,让W和 W0 Respectively, the weight of the encoder and decoder,尽管没有必要,But they can be bound.类似地,b和b0Refers to the deviation of items. DAE By solving the following(正则化)优化问题来学习:
其中:
这里λLoss and weight penalty term is the equilibrium reconstruction parameters,k·kF表示 Frobenius 范数,而f(·)是非线性激活函数,It usually is a logicS形函数或双曲正切函数. Through reconstruction from damaged version input,DAE More effectively than traditional automatic encoder by preventing the automatic encoder simply learning identity mapping to find more powerful features.
In order to further enhance the learning meaningful features,Sparse constraint is imposed on the average activation of hidden units value. If you are using logicsigmoid激活函数,Each unit of output can be regarded as the probability of its activities. 设ρj表示第jA unit of target sparse degree,ρjSaid its average experience activation rate. You can then introduceρ和ρThe cross entropy as aEqn4Additional penalty term:
其中m是隐藏单位的数量.在预训练阶段之后,可以展开SDAETo form a feed-forward neural network.Using the classic back propagation algorithm to fine-tune the entire network.为了提高收敛速度,Can use simple momentum method or more advanced optimization techniques,例如 L-BFGS Or conjugate gradient method.
对于网络架构,We used complete filter on the first floor.This is a deliberate choice,Because have found excessive complete usually can better capture the basis of image structure.这符合V1Visual cortex neural physiological mechanism.然后,When adding a new layer,A few cut in half,直到只有256A hidden unit,As the bottleneck of automatic encoder. SDAEThe whole structure is shown in figure1(b)所示.In order to further accelerate the speed of the first layer of pre training to learn the local characteristics of,我们将每个32×32Tiny image is divided into five16×16补丁(左上,右上,左下,右下,中间),Then training five DAE,每个 DAE 有512A hidden unit.之后,We use five small DAE The weight of initialize a big DAE,Then the normal training big DAE.Some of the first layer of randomly selected filter as shown in figure2所示.正如所料,Most of the filter have the effect of highly localized edge detector.
3.2 Online tracking process
To track the location of the object by its bounding box of the first frame specified.In close to the object distance from the background to collect a few bad examples.然后将SigmoidClassification layer is added to the from offline trainingSDAE的编码器部分.整个网络架构如图1(c)所示.When the new video frame arrived,We first according to the particle filter method to the particle(A goal is a particle there may be a image,32*32).And then through a simple forward transmission network to determine the confidence level of each particlepi.该方法计算量小,精度高.
If all the particles in the frame biggest confidence level below the predetermined thresholdτ,It can indicate significant changes in appearance of tracked object.要解决此问题,During this happens again调整(tune)整个网络.We noticed the threshold τ Should be set by maintaining balance.如果 τ 太小,The tracker can't very well to adapt to changes in appearance,如果τ太大,Even hide objects or background might be wrongly regarded as being tracked objects,Therefore lead to target drift.
4 实验
我们使用10A challenging benchmark video sequence,In this section from experience to DLT Compared with some of the most advanced tracking.The tracker is:MTT,CT,VTD,MIL,L1T,TLD 和 IVT The latest variant.We use the authors provide the tracker original implementation.If the tracker can only handle grayscale video,MATLAB The image processing toolbox provided by rgb2gray Function is used to color video into a gray.为了加速计算,我们还利用 MATLAB Parallel Computing Toolbox提供的 GPU Off-line training and on-line calculation of track.Code and supplementary material on the projects page provides:http://winsty.net/dlt.html.
4.1 DLT实施细节
We have the momentum gradient method is used to optimize. 动量参数设置为0.9. 对于 SDAE 的离线训练,We injected variance as0.0004The gaussian noise input to generate the damage. 我们将λ= 0.0001,ρi= 0.05,小批量大小设置为100.For online adjustment,We use a largerλ值0.002In order to avoid excessive fitting and smaller small batch size10.阈值τ设置为0.9. Particle filter is used1000个粒子. 对于其他参数,The affine parameters such as particle filters and other methods in the search window size,We implement grid search to determine the optimal value. 如果适用,The same Settings will apply to all other methods.
学习更多编程知识,请关注我的公众号:
边栏推荐
猜你喜欢
2022 CCF国际AIOps挑战赛决赛暨AIOps研讨会报名已开启
苹果Meta都在冲的Pancake技术,中国VR团队YVR竟抢先交出产品答卷
365天挑战LeetCode1000题——Day 050 在二叉树中增加一行 二叉树
Flink Yarn Per Job - 启动TM,向RM注册,RM分配solt
Scaling-law和模型结构的关系:不是所有的结构放大后都能保持最好性能
ECCV 2022 | 视听分割:全新任务,助力视听场景像素级精细化理解
PHP高级检索功能的实现以及动态拼接SQL
机器学习——集成学习
Nature:猪死亡1小时后,器官再次运转
365 days challenge LeetCode1000 questions - Day 050 add a row to the binary tree binary tree
随机推荐
Flink Yarn Per Job - RM启动SlotManager
Detailed explanation of PPOCR detector configuration file parameters
微服务结合领域驱动设计落地
2022 极术通讯-基于安谋科技 “星辰” STAR-MC1的灵动MM32F2570开发板深度评测
负载均衡应用场景
微信小程序标题栏封装
“小钢炮”气质明显,安全、舒适一个不落
【7.29-8.5】写作社区精彩技术博文回顾
STM32 entry development: write XPT2046 resistive touch screen driver (analog SPI)
LeetCode刷题(8)
发现C语言的乐趣
再获殊荣 | 赛宁网安入选2022年度“培育独角兽”企业榜单
高泽龙出席博鳌全球旅游生态大会 讲元宇宙与未来网络科技
停电。。。烦烦烦!!!
安全软件Avast与赛门铁克诺顿NortonLifeLock合并获英国批准
图像分割模型——segmentation_models_pytorch和albumentations 组合实现多类别分割
Mathcad 15.0软件安装包下载及安装教程
张朝阳对话俞敏洪:一边是手推物理公式,一边是古诗信手拈来
WPF开发随笔收录-WriteableBitmap绘制高性能曲线图
五大理由告诉你为什么开发人员选择代码质量静态分析工具Klocwork来实现软件安全