当前位置:网站首页>CVPR 2022 | pttr: 3D point cloud target tracking based on transformer

CVPR 2022 | pttr: 3D point cloud target tracking based on transformer

2022-06-09 11:35:00 3D vision workshop

Click on the above “3D Visual workshop ”, choice “ Star standard ”

The dry goods arrive at the first time

9dd24da8e2282d487ea3ac70438131bd.png

Author luozhipeng

Source: Shangtang Academy

97731f4194455a547b8ffbc3fd593f4d.gif

Reading guide

stay CVPR 2022 On , Shangtang Research Institute team proposed based on Transformer Of 3D Point cloud tracking model PTTR.PTTR Firstly, in the feature extraction stage, feature association is proposed to sample to save more points related to the tracked object , Then the point cloud association is designed Transformer Module for point cloud feature matching . Last ,PTTR A lightweight prediction correction module is proposed to further improve the accuracy of prediction . Experimental results show that PTTR Significant accuracy improvement over multiple data sets .

Title of thesis : PTTR: Relational 3D Point Cloud Object Tracking with Transformer

3a0b60c8df9359b2588d0f33b788213d.png

Problems and challenges

Target tracking is a basic computer vision task , It has been widely studied on image data . In recent years , With the development of Radar Technology , Point cloud based target tracking has also received more attention . Point cloud data has some unique challenges , For example, the sparsity of point clouds 、 Occlusion and noise . These characteristics make it impossible for us to directly use image-based algorithms for tracking , At present, the tracking algorithm based on point cloud has not been fully studied . One of the major challenges of point cloud tracking is when the object is far away from the sensor , Sparse point cloud will bring great difficulty to tracking . in addition , The existing point cloud tracking algorithms mainly use the linear method of cosine similarity to match the features , There's a lot of room for improvement .

Methods to introduce

In view of the above questions , We propose a novel point cloud tracking framework , As shown in the figure below . The model is divided into three stages : In the feature extraction stage , We propose a new relationship aware sampling method (Relation-Aware Sampling), The feature relationship between the template and the search area is used for sampling , So as to save more former scenic spots . In the feature matching stage , We propose a point cloud relationship Transformer structure (Point Relation Transformer), Effectively match the features of the template and the search area . Finally, we propose a prediction correction module (Prediction Refinement Module), The accuracy of prediction is further improved by feature sampling .

44591ca97319c9562bd190a25fd5aa80.png

1.  Relationship aware sampling (Relation-Aware Sampling)

The sparsity of point cloud is a big challenge of tracking algorithm , The feature extraction of point cloud is usually accompanied by the process of down sampling . Most of the existing tracking algorithms use random sampling , During the sampling process, a large number of former scenic spots will be lost in the search area , It is not conducive to subsequent feature matching . So we propose relationship aware sampling , The feature distance between the template and the search area is used for sampling . Because the template area is mostly composed of point clouds on the target object , So we sample those points in the search area where the feature distance and template are as small as possible , You can get as many front attractions as possible . As shown in the figure below , We compared different sampling methods , The sampling points are located in 3 The scale in the dimension target box , It is obvious that our proposed relational perceptual sampling maximizes the former scenic spots .

e66a6cb71c8d2116e0a9c07dac09d2c7.png

2. Relationship enhancement matching (Relation-Enhanced Feature Matching)

In tracking problems , The search area needs to match our template , Most existing 3D The single target tracking algorithm adopts the characteristic cosine distance , It is considered that the points with small cosine distance have high matching degree . The difference is , Based on the successful application of attention mechanism in computer vision , A relationship based attention mechanism is designed , To match the template and the point cloud of the search area . As shown in the figure below , The attention module we designed makes use of offset-attention, take query,key,value Feature fusion , The nonlinearity is introduced through the activation layer . say concretely , Let's go through a self-attention Module to process the template and search area point cloud respectively , Then we use the search area point cloud as query, The point cloud in the template area is used as key and value, Input to a cross-attention, The point cloud features of the search area after matching are obtained .

de6bb2e306fe32b42cba0872623fad41.png

3. Prediction from coarse to fine (Coarse-to-Fine Tracking Prediction)

Most existing 3D Single target tracking algorithms are simply used 3D The prediction module of the detector , for example Votenet,RPN etc. . We believe that similar detection and prediction modules inevitably introduce redundant calculations , It leads to a decrease in efficiency . Therefore, we propose a new prediction correction module , The module passes from the template point cloud , Search point cloud , The fused search point cloud respectively takes out the corresponding point cloud features , Combine them and directly predict . Essentially , We let each point of the search area , Predict a through the characteristics of different stages proposal. stay inference Stage , We will be the one with the highest score proposal As a result of prediction .

4. Data sets

In addition to methodological contributions , We also propose a method based on Waymo Open Dataset New large-scale point cloud tracking data set . because Waymo Each target is marked with the corresponding ID, So you can extract a ID Location information at different times , Based on this , We made it Waymo Single target tracking data set , As shown in the following table , We made Waymo Tracking data sets far exceed the amount of data KITTI, It provides a platform for further research on big data sets baseline.

78d4bac2c0bafecc38f80bc7e7335952.png

5. experiment

We are KITTI, Waymo The data sets are compared with PTTR And other models , As shown in the following table , You can see PTTR Advantages over existing methods .

df2546d4ddc6b6c42a555c89d71d59ba.png

860c2121689c7424b5813d2392882da4.png

In order to verify the effect of each module , We conducted various ablation experiments , The experimental results also verify the effectiveness of each module we propose .

bbc02ddcd057dd42a164e5c7994b419e.png

25d4dae69458ab848c2e8de89adabbac.png

Conclusion

In this paper , We came up with a new one 3D Point cloud tracking model . It uses relational sensing sampling to alleviate the problem of sparse point cloud , utilize Transformer Attention mechanism to complete effective feature matching , And local feature sampling is used to further improve the prediction accuracy . Experiments show that our proposed method effectively improves the performance of point cloud tracking .

Portal

PTTR The relevant code of has been open source , Welcome to use and exchange .

Address of thesis

https://arxiv.org/pdf/2112.02857.pdf

Project address

https://github.com/Jasonkks/PTTR

This article is only for academic sharing , If there is any infringement , Please contact to delete .

3D Visual workshop boutique course official website :3dcver.com

1. Multi sensor data fusion technology for automatic driving field

2. For the field of automatic driving 3D Whole stack learning route of point cloud target detection !( Single mode + Multimodal / data + Code )
3. Thoroughly understand the visual three-dimensional reconstruction : Principle analysis 、 Code explanation 、 Optimization and improvement
4. China's first point cloud processing course for industrial practice
5. laser - Vision -IMU-GPS The fusion SLAM Algorithm sorting and code explanation
6. Thoroughly understand the vision - inertia SLAM: be based on VINS-Fusion The class officially started
7. Thoroughly understand based on LOAM Framework of the 3D laser SLAM: Source code analysis to algorithm optimization
8. Thorough analysis of indoor 、 Outdoor laser SLAM Key algorithm principle 、 Code and actual combat (cartographer+LOAM +LIO-SAM)

9. Build a set of structured light from zero 3D Rebuild the system [ theory + Source code + practice ]

10. Monocular depth estimation method : Algorithm sorting and code implementation

11. Deployment of deep learning model in autopilot

12. Camera model and calibration ( Monocular + Binocular + fisheye )

13. blockbuster ! Four rotor aircraft : Algorithm and practice

14.ROS2 From entry to mastery : Theory and practice

15. The first one in China 3D Defect detection tutorial : theory 、 Source code and actual combat

blockbuster !3DCVer- Academic paper writing contribution   Communication group Established

Scan the code to add a little assistant wechat , can Apply to join 3D Visual workshop - Academic paper writing and contribution   WeChat ac group , The purpose is to communicate with each other 、 Top issue 、SCI、EI And so on .

meanwhile You can also apply to join our subdivided direction communication group , At present, there are mainly 3D Vision CV& Deep learning SLAM Three dimensional reconstruction Point cloud post processing Autopilot 、 Multi-sensor fusion 、CV introduction 、 Three dimensional measurement 、VR/AR、3D Face recognition 、 Medical imaging 、 defect detection 、 Pedestrian recognition 、 Target tracking 、 Visual products landing 、 The visual contest 、 License plate recognition 、 Hardware selection 、 Academic exchange 、 Job exchange 、ORB-SLAM Series source code exchange 、 Depth estimation Wait for wechat group .

Be sure to note : Research direction + School / company + nickname , for example :”3D Vision  + Shanghai Jiaotong University + quietly “. Please note... According to the format , Can be quickly passed and invited into the group . Original contribution Please also contact .

5324868f5c14680f386189846c9f912d.png

▲ Long press and add wechat group or contribute

b0c252b6ac2c2af2e6b51bb2cf91463e.png

▲ The official account of long click attention

3D Vision goes from entry to mastery of knowledge : in the light of 3D In the field of vision Video Course cheng ( 3D reconstruction series 3D point cloud series Structured light series Hand eye calibration Camera calibration laser / Vision SLAM Automatically Driving, etc )、 Summary of knowledge points 、 Introduction advanced learning route 、 newest paper Share 、 Question answer Carry out deep cultivation in five aspects , There are also algorithm engineers from various large factories to provide technical guidance . meanwhile , The planet will be jointly released by well-known enterprises 3D Vision related algorithm development positions and project docking information , Create a set of technology and employment as one of the iron fans gathering area , near 4000 Planet members create better AI The world is making progress together , Knowledge planet portal :

Study 3D Visual core technology , Scan to see the introduction ,3 Unconditional refund within days

86edb3fea3b9b2fc7670cc55b7cade88.png

  There are high quality tutorial materials in the circle 、 Answer questions and solve doubts 、 Help you solve problems efficiently

Feel useful , Please give me a compliment ~ 

原网站

版权声明
本文为[3D vision workshop]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/160/202206091046398518.html