当前位置:网站首页>Bytetrack: multi object tracking by associating every detection box paper reading notes ()
Bytetrack: multi object tracking by associating every detection box paper reading notes ()
2022-07-06 10:25:00 【How about a song without trace】
explain : Recently, I'm working on Huawei mindspore Framework migration , This project integrates many MOT Cutting edge knowledge of , The main correlation method is excellent .
Multitarget tracking (MOT) Purpose : It is to estimate the bounding box and identity of the object in the video .
One 、 How to do it in the past ( Questions raised ):
The identity feature is obtained through the detection box whose correlation score is higher than a set threshold , But those with low test scores , Or covered , Motion blurred , Will be simply eliminated , This will lead to problems : The absence of real objects , Fragment trajectorization .
Two 、 This paper discusses how to solve this problem :
A simple method is proposed 、 It works 、 General correlation method , By associating almost every detection box , Instead of just tracking the high score detection box .
Problem solving process :
For low score detection boxes , Use the similarity with the trajectory to restore the real image , And filter out the background .
This paper applies 9 Different state-of-the-art trackers to verify the effectiveness of the correlation method .
The tracker of this article , It is also innovative (ByteTrack).
3、 ... and 、 The questions raised are shown in the figure :
The picture below , It shows that the previous method only associates the high scores , Boxes of the same color represent the same identity .
The following figure shows the method of this article , The dotted box is carried out with Kalman filter and the previously predicted track box IOU The trajectory matching of .
Make full use of the matching process of the detection frame from high score to low score , The correlation method of this paper , We first match the high score detection box with the trajectory according to the motion similarity or appearance similarity .
Kalman filter is used to predict the position of the new track frame . The similarity can be determined by the prediction box and the detection box IOU Or pedestrian identity feature distance , above b The graph is the result of the first match .
We do a second match between mismatched tracks , That is, the track in the red box , And second matching between low score detection boxes using the same motion similarity .
chart C It shows the results after the second match . The occluded person with low detection score correctly matches the previous track , And remove the background ( On the right side of the image ).
Four 、 Connection method --Byte Innovation :
First Association :
Suppose the above figure shows the correlation strategy of the current frame , Then the input of this operation is the Kalman filter prediction result of the information of all tracking frames of the previous frame and the detection frame with a confidence higher than the threshold obtained by the detection network detection of the current frame , That is, the pink frame in the above figure .( For the convenience of drawing , I will KF Operations are included in the Association 1 Inside )
Subsequent operations are classic IoU Matching and Hungarian algorithm optimization , Get the tracking result of the current frame . relation 1 After that , In order to get a match, the tracking box and detection box will be retained ( In the picture D_remain and T_remain), For subsequent operations .
Second Association :
For low confidence test boxes , Because the target is often in a state of serious occlusion and serious motion blur , So appearance similarity features ( such as ReID) Very unreliable , By comparison IoU Matching is a better choice , In view of this , In connection with 2 in , The author only used IoU, Appearance similarity is not introduced .
BYTE Our innovation lies in detecting and correlating connected regions , The low score detection frame is a bridge to promote the development of both .
We use the latest high-performance detector YOLOX To get the detection box , And compare them with our Byte Related to .
stay MOT In the challenge ,ByteTrack stay MOT17[44] and MOT20[17] No. 1 on the list , stay MOT17 and 77.3MOTA,77.3IDF1 and 63.1HOTA.
comparison deep sort,ByteTrack In the case of occlusion, the improvement is very obvious . But here's the thing , because ByteTrack No appearance features are used for matching , So the effect of tracking depends very much on the effect of detection , That is to say, if the detector works well , Tracking will also achieve good results , But if the detection effect is not good , It will seriously affect the tracking effect .
- BYTE How it works :
If there is occlusion detection , From high to low , For example, the occluded object may be a visible object before , The test score is also relatively high , At this time, the trajectory is established . But when the object is blocked , By detecting the coincidence degree of the position of the frame and the track, that is IOU You can dig out the blocked objects from the low-level frame , Maintain the continuity of the trajectory .
( Kalman wave filtering ) Used in changing dynamic systems , The next state is estimated by fusing the current state with the observed value . This is an iterative process , It is also a process of data fusion .
- Byte The input of is a video sequence , And an object detector Det. We also set a detection score threshold (τ)BYTE The output of is the track of the video T, Each track contains the bounding box and features of the object in each frame .(Byte Algorithm 3 To 13 That's ok )
- When separating the high score detection box and the low score detection box , We use Kalman filter to predict T The current frame of each track in .(14 To 16 That's ok )
- The first association is in the high score detection box Dhigh And all tracks T( Including the missing Tloss Track of )) Between .
- Similarity degree #1 Can pass IoU Or test box Dhigh And the predicted trajectory T Between Re-ID Feature distance to calculate . then , We use Hungarian algorithm to complete the matching based on similarity . We keep the mismatched detection box Dremain And mismatched tracks remain Tremain in ( Algorithm 1 No 17 To 19 That's ok ).
- Byte Is highly flexible , And it can be compatible with other different correlation methods . And we will Byte be applied to 9 Different state-of-the-art trackers , And has made significant improvements in almost all indicators .
- The second correlation is in the low score detection box Dlow And the track after the first association remains Tremain In between .
- We kept Tremain− Mismatched tracks in , And just delete all mismatched low score detection boxes , Because we regard them as the background .(BYTE Algorithm number 20 To the first 21 That's ok ).
- Because the low score detection box contains serious occlusion and motion blur , And the appearance characteristics are not desirable , So use it alone IOU As the similarity is very important , And appearance similarity is not used in the second Association .
- After the association is completed , From track T Delete unmatched tracks in Tremain, Here in order to achieve a longer range of contacts , take Tremain Put it in Tloss in , When Tlost More than 30 Frame time , From track it from T Delete in , Otherwise we will still remain Tlost in ( Algorithm 22 That's ok )
- The high score detection box that never matches after the first association Dremain Initialize new tracks in ( Algorithm 23 Row to 27 That's ok ), The output of each individual frame is the track in the current frame T Bounding box and logo , But it doesn't output Tloss Box and logo of .
- Byte The input of is a video sequence , And an object detector Det. We also set a detection score threshold (τ)BYTE The output of is the track of the video T, Each track contains the bounding box and features of the object in each frame .(Byte Algorithm 3 To 13 That's ok )
- When separating the high score detection box and the low score detection box , We use Kalman filter to predict T The current frame of each track in .(14 To 16 That's ok )
- The first association is in the high score detection box Dhigh And all tracks T( Including the missing Tloss Track of )) Between .
- Similarity degree #1 Can pass IoU Or test box Dhigh And the predicted trajectory T Between Re-ID Feature distance to calculate . then , We use Hungarian algorithm to complete the matching based on similarity . We keep the mismatched detection box Dremain And mismatched tracks remain Tremain in ( Algorithm 1 No 17 To 19 That's ok ).
- Byte Is highly flexible , And it can be compatible with other different correlation methods . And we will Byte be applied to 9 Different state-of-the-art trackers , And has made significant improvements in almost all indicators .
- The second correlation is in the low score detection box Dlow And the track after the first association remains Tremain In between .
- We kept Tremain− Mismatched tracks in , And just delete all mismatched low score detection boxes , Because we regard them as the background .(BYTE Algorithm number 20 To the first 21 That's ok ).
- Because the low score detection box contains serious occlusion and motion blur , And the appearance characteristics are not desirable , So use it alone IOU As the similarity is very important , And appearance similarity is not used in the second Association .
- After the association is completed , From track T Delete unmatched tracks in Tremain, Here in order to achieve a longer range of contacts , take Tremain Put it in Tloss in , When Tlost More than 30 Frame time , From track it from T Delete in , Otherwise we will still remain Tlost in ( Algorithm 22 That's ok )
- The high score detection box that never matches after the first association Dremain Initialize new tracks in ( Algorithm 23 Row to 27 That's ok ), The output of each individual frame is the track in the current frame T Bounding box and logo , But it doesn't output Tloss Box and logo of .
adopt Det get detection boxes and scores, At the same time, it is based on Thing and Tlow Divide all detection boxes into two parts Dhigh and Dlow, The score will exceed Thigh The detection box of belongs to Dhigh in , Lower the score below the threshold Tlow Put the detection box of Dlow in .
To every track track structure kalman filter .
Associate the high score detection box with all tracks .( Similarity is calculated by IoU, Matching is achieved with Hungarian algorithm . about IoU Less than 0.2 Your rejection matches ). And get the detection frame that can't match the track and the track that doesn't match the detection frame , Note that you can also add Re-ID features .
The second association is between and , For unmatched detection boxes As a background, delete it directly , For the unmatched track, write , Use IoU As a measure of similarity , Not used appearance( Low confidence appearance unreliable )
After the association is completed , Delete from the track unmatched tracks, Here to achieve long-range association, Is put in , Only when he appears in more than 30 When the frame , Before deleting the track
After the first Association , Initialize a new track from the high confidence detection box on no match . For each detection box in , If the test score exceeds , And in two consecutive frames , We initialize a new track( The trajectory ).
ByteTrack frame :
The model is based on high-performance detector YOLOX As well as the association Method BYTE. YOLOX Yes, it will YOLO series detectors Switch to anchor-free Pattern , And used Mosaic,Mixup,SimOTA Wait to get SOTA Performance of .
- The backbone network is the same as YOLOv5 identical , use CSPNet And additional PAN head .
- After the backbone network, there are two decouple head , One for classification , the other one head For regression . I've added one IoU-aware branch To predict the predicted boxes and gt boxes Between IoU.
- The regression part passed GIoU loss constraint ; Categories and IoU heads adopt binary cross entropy loss constraint .
6、 ... and 、 The details of the experiment :
about BYTE, The default detection score threshold is 0.6,. about MOT17、MOT20、HiEve Basic assessment of , We only use IOU As a similarity index . In linear assignment, if there is a IOU Less than 0.2, Then reject matching , For missing tracks , We kept 30 frame , Prevent recurrence .
about BDD100K, We use UniTrack[68] As Re-ID Model . In ablation studies , We use FastReID[27] extract MOT17 Of Re-ID features .
about ByteTrack, Detector is YOLOX, With yolox-x As the backbone , With coco The pre training model is the initialization weight . The training strategy is MOT17,CrowdHuman,Cityperson as well as ETHZ Training 80 grid epoch
Enter image size as .1440*800. In multiscale training , The shortest edge ranges from 576 To 1024. Enhancements to data sets include Mosaic and Mixup. stay 8 block tesla V100 Upper use batch size The size is 48 Training , Use SGD 0.001 Learning rate ,0.0005 Weight attenuation and 0.9 Momentum of , At the first epcoh Use warm up, At the same time, cosine annealing strategy is adopted .FPS In a single GPU Upper use fp16 Precision and batch size1.
Warm up lr The strategy is to use a relatively small learning rate in the early stage of network training , linear Increase to the initial learning rate . The training time is 12 Hours
SORT It can be regarded as our baseline method , Because these two methods only use Kalman filter to predict the motion of objects . We can find out ,BYTE take SORT Of MOTA Measure from 74.6 Up to 76.6,IDF1 from 76.9 Up to 79.3, And will ID from 291 Down to 159. This highlights the importance of the low score detection box
Mainly in occlusion and motion blur has been improved .
- We noticed that in MOT17 There are some completely blocked pedestrians , Their visible ratio in the real annotation of the ground is 0.
- Because it is almost impossible to detect them by visual clues , So we get these objects through trajectory interpolation .
ByteTrack The detector part of adopts YOLOX.
One 、 In the key part of the data , and SORT equally , Only the Kalman filter is used to predict the position of the track of the current frame in the next frame , Between the predicted box and the actual detection box IoU As the similarity of two matches , The matching is completed by Hungarian algorithm . Here's what's interesting ByteTrack Not used ReID Feature to calculate appearance similarity , That is to say, only the motion model is used .
Two 、 Why not use ReID features ?
The author explains : The first point is to make it as simple and high-speed as possible , The second point is that we find that when the test results are good enough , The prediction accuracy of Kalman filter is very high , Can replace ReID Long time correlation between objects . It was also found in the experiment that ReID The tracking results are not improved .
problem : because The tracking model strongly depends on extracting the appearance features of objects , If the appearance of the tracking object is basically the same , How does the existing model perform ? At present, the motion mode of objects in the mainstream multi-target tracking data set is very simple , Nearly uniform linear motion , If the motion mode of an object is very complex , Multiple objects shuttle back and forth , How does the existing model perform ?
边栏推荐
- Transactions have four characteristics?
- PyTorch RNN 实战案例_MNIST手写字体识别
- MySQL real battle optimization expert 08 production experience: how to observe the machine performance 360 degrees without dead angle in the process of database pressure test?
- Super detailed steps to implement Wechat public number H5 Message push
- MySQL combat optimization expert 12 what does the memory data structure buffer pool look like?
- Solve the problem of remote connection to MySQL under Linux in Windows
- The appearance is popular. Two JSON visualization tools are recommended for use with swagger. It's really fragrant
- MySQL34-其他数据库日志
- MySQL learning diary (II)
- [paper reading notes] - cryptographic analysis of short RSA secret exponents
猜你喜欢
Notes of Dr. Carolyn ROS é's social networking speech
Ueeditor internationalization configuration, supporting Chinese and English switching
Typescript入门教程(B站黑马程序员)
MySQL Real Time Optimization Master 04 discute de ce qu'est binlog en mettant à jour le processus d'exécution des déclarations dans le moteur de stockage InnoDB.
C miscellaneous dynamic linked list operation
MySQL实战优化高手12 Buffer Pool这个内存数据结构到底长个什么样子?
cmooc互联网+教育
Installation of pagoda and deployment of flask project
保姆级手把手教你用C语言写三子棋
C杂讲 动态链表操作 再讲
随机推荐
好博客好资料记录链接
The governor of New Jersey signed seven bills to improve gun safety
C杂讲 动态链表操作 再讲
PyTorch RNN 实战案例_MNIST手写字体识别
简单解决phpjm加密问题 免费phpjm解密工具
Simple solution to phpjm encryption problem free phpjm decryption tool
Set shell script execution error to exit automatically
MySQL底层的逻辑架构
In fact, the implementation of current limiting is not complicated
Anaconda3 安装cv2
MySQL learning diary (II)
数据库中间件_Mycat总结
Sed text processing
Solve the problem of remote connection to MySQL under Linux in Windows
Const decorated member function problem
MySQL combat optimization expert 05 production experience: how to plan the database machine configuration in the real production environment?
pytorch的Dataset的使用
Target detection -- yolov2 paper intensive reading
Ueeditor internationalization configuration, supporting Chinese and English switching
安装OpenCV时遇到的几种错误