当前位置：网站首页>Slim: self supervised point cloud scene flow and motion estimation (iccv 2021)

Slim: self supervised point cloud scene flow and motion estimation (iccv 2021)

2022-07-27 20:53:00 【3D vision workshop】

Click on the above “3D Visual workshop ”, choice “ Star standard ”

The dry goods arrive at the first time

Author bubble robot

Source Bubble robot SLAM

title ：SLIM: Self-Supervised LiDAR Scene Flow and Motion Segmentation

author ：Stefan Andreas Baur, David Josef Emmerichs, Frank Moosmann, Peter Pinggera1, Ommer and Andreas Geiger

source ：ICCV 2021

compile ：cristin

to examine ：zh

Abstract

Hello everyone , Today's article for you SLIM: Self-Supervised LiDAR Scene Flow and Motion Segmentation

In recent years , There are several self supervised learning frameworks for 3D scene flow based on point cloud .Sceneflow Inherently, each scene is divided into multiple movements agent And clustering following rigid body motion . However , Existing methods do not take advantage of this feature of data in their self-monitoring training programs , This can improve and stabilize flow prediction . Based on the difference between robust rigid self motion estimation and original flow prediction , We generate a self supervised motion segmentation signal . Predicted motion segmentation , In turn, , Our algorithm is used to focus on stationary points , Motion information gathered in the static part of the scene . We learn our model end-to-end through back-propagation gradient through kabch The algorithm of , And prove that this can improve ego-motion It is estimated that , So as to improve scene flow estimation . In ablation studies , We further analyze the performance gain when combining motion segmentation and scene flow . We also propose a new network architecture of 3D lidar scene flow , So that it can handle an order of magnitude more than the number of points in the previous training .

project_home：https://baurst.github.io/slim/

Main work and contribution

1. Our method is the first scene flow estimation method based on point cloud , It divides points into “ motion ” and “ static ” Two types of .

2. Our method is significantly better than previous methods in scene flow estimation based on point cloud , Especially in the generalization of previously unseen data , We have demonstrated these data on several datasets , Including self-monitoring and full monitoring settings .

3. Our novel network architecture can handle many more points than the current weak or self-monitoring methods .

Method

Three dimensional scene flow estimation uses two continuous input point clouds Pt∈RN×3, Pt+1∈RM×3 Predict a three-dimensional displacement vector for each point in the first point cloud , Represents the motion of each point w.r.t.

chart 1 Overview of our network architecture . Convolution gated loop unit (ConvGRU) According to the forecast flo Find out about it , Iteratively predict flow and log updates

Point Cloud Encoder: utilize [20]( The weights of Shared ) Introduced pillar feature network (PFN) The point cloud will be entered Pt, Pt+1 The codes are respectively BEV Pseudoimage , Get the value of the It, It+1∈RH×W ×C And then handled by the backbone . We use the same for all datasets BEV Range , Cover your car −35m≤x, y≤35m The square of ,x, y Is the horizontal axis . The resolution we use is h =W= 640, This corresponds to an approximate 11cm The size of the column .

Flow Backbone: Our trunk is largely affected by raft[39] Inspired by the , It is used to predict the dense optical flow on the image . Its core component is recursive update blocking and flow prediction of hidden states , Each iteration produces a finer and more accurate flow . therefore , We use the independently encoded input image to construct the Correlation , Use previous traffic forecasts to find relevant values , Thus, the flow is directed to the pixel area with more accurate matching . Even though RAFT It is designed for dense optical flow , But we show that , It is in sparse distribution BEV Domains are also very useful , And it can be well promoted . Compared with conventional images ,BEV Domains are more dispersed 、 Smaller areas and very independent movement patterns ( Mobile traffic participants ) form .

We use RAFT[39] To handle traffic forecasts , And iteratively update two additional logarithms , Pictured 2 Shown . first logit mapping Lcls Used as an output signal , Classify points as static or moving world frames . The accuracy of traffic prediction can vary greatly in a scenario , Because the featureless surface is not suitable for flow estimation . the second logit Lwgt Used to overcome this problem , Allow the network to show its confidence in traffic estimation . The output decoder uses these two logarithms to aggregate and improve the accuracy of static and dynamic scene elements .

Lclsis The processing of is similar to the processing of streams , But confidence weighted tasks are more closely related to flow prediction , Therefore, data flow is coupled in the process of information processing . In addition to this small change , We kept RAFT The general framework of , Including not only zeroing the gradient on the input stream , And the gradient is also zeroed on the input log of each update block .

Output Decoder: First , The output decoder uses these BEV Map the input point cloud pt Annotate each point of , And use a stream vector lookup and two logits Lcls,i,Lwgt, According to the value of its corresponding column element . By doing so , We assume that the behavior of all points in the column is very similar . We believe this is correct for almost all LIDAR point clouds measured outdoors , Because all mobile traffic participants need to occupy some space .

Besides , The beam of most lidar systems is not upward . Please note that , Although our network architecture is dedicated to this way , But our loss framework is suitable for any 3D Scene flow prediction , There's no need to assume 2D flow . In order to regularize and improve traffic prediction on static scenes , The output decoder will be classified as still The points of are aggregated into a single coherent rigid motion range transformation Tr∈R4×4.

We use kabch Algorithm [17], Using singular value decomposition to calculate differentiable Tr value . The weight of each point wi Determines the final result of each flow vector prediction Tr The degree of influence . We first apply confidence logarithm sigmoid Activate , Then mask them based on the classification logarithm . then , We have reunited ownership , Make the sum of 1, To ensure the stability of the value .

Degree of confidence logits Only by calculation Tr Accept gradient updates , So it's end-to-end training , No further supervision is required

chart 2 Yes KITTI-SF Qualitative comparison of scenario methods . according to AccR The accurately estimated flow is blue , Inaccurate predictions are red . From left to right :PointPWCNet (PPWC), PoseFlowNet (PF), Ours

experimental result

For biography

chart 3 Left : Segmentation of real ground motion , Right : Forecast dynamics , Higher probability of movement is brighter

Click on Read the original , You can get a link to download this article .

This article is only for academic sharing , If there is any infringement , Please contact to delete .

3D Visual workshop boutique course official website ：3dcver.com

1. Multi sensor data fusion technology for automatic driving field

2. For the field of automatic driving 3D Whole stack learning route of point cloud target detection ！( Single mode + Multimodal / data + Code )
3. Thoroughly understand the visual three-dimensional reconstruction ： Principle analysis 、 Code explanation 、 Optimization and improvement
4. China's first point cloud processing course for industrial practice
5. laser - Vision -IMU-GPS The fusion SLAM Algorithm sorting and code explanation
6. Thoroughly understand the vision - inertia SLAM： be based on VINS-Fusion The class officially started
7. Thoroughly understand based on LOAM Framework of the 3D laser SLAM: Source code analysis to algorithm optimization
8. Thorough analysis of indoor 、 Outdoor laser SLAM Key algorithm principle 、 Code and actual combat (cartographer+LOAM +LIO-SAM)

9. Build a set of structured light from zero 3D Rebuild the system [ theory + Source code + practice ]

10. Monocular depth estimation method ： Algorithm sorting and code implementation

11. Deployment of deep learning model in autopilot

12. Camera model and calibration ( Monocular + Binocular + fisheye ）

13. blockbuster ！ Four rotor aircraft ： Algorithm and practice

14.ROS2 From entry to mastery ： Theory and practice

15. The first one in China 3D Defect detection tutorial ： theory 、 Source code and actual combat

16. be based on Open3D Introduction and practical tutorial of point cloud processing

blockbuster ！3DCVer- Academic paper writing contribution Communication group Established

Scan the code to add a little assistant wechat , can Apply to join 3D Visual workshop - Academic paper writing and contribution WeChat ac group , The purpose is to communicate with each other 、 Top issue 、SCI、EI And so on .

meanwhile You can also apply to join our subdivided direction communication group , At present, there are mainly 3D Vision 、CV& Deep learning 、SLAM、 Three dimensional reconstruction 、 Point cloud post processing 、 Autopilot 、 Multi-sensor fusion 、CV introduction 、 Three dimensional measurement 、VR/AR、3D Face recognition 、 Medical imaging 、 defect detection 、 Pedestrian recognition 、 Target tracking 、 Visual products landing 、 The visual contest 、 License plate recognition 、 Hardware selection 、 Academic exchange 、 Job exchange 、ORB-SLAM Series source code exchange 、 Depth estimation Wait for wechat group .

Be sure to note ： Research direction + School / company + nickname , for example ：”3D Vision + Shanghai Jiaotong University + quietly “. Please note... According to the format , Can be quickly passed and invited into the group . Original contribution Please also contact .

▲ Long press and add wechat group or contribute

▲ The official account of long click attention

3D Vision goes from entry to mastery of knowledge ： in the light of 3D In the field of vision Video Course cheng （ 3D reconstruction series 、 3D point cloud series 、 Structured light series 、 Hand eye calibration 、 Camera calibration 、 laser / Vision SLAM、 Automatically Driving, etc ）、 Summary of knowledge points 、 Introduction advanced learning route 、 newest paper Share 、 Question answer Carry out deep cultivation in five aspects , There are also algorithm engineers from various large factories to provide technical guidance . meanwhile , The planet will be jointly released by well-known enterprises 3D Vision related algorithm development positions and project docking information , Create a set of technology and employment as one of the iron fans gathering area , near 4000 Planet members create better AI The world is making progress together , Knowledge planet portal ：

Study 3D Visual core technology , Scan to see the introduction ,3 Unconditional refund within days

There are high quality tutorial materials in the circle 、 Answer questions and solve doubts 、 Help you solve problems efficiently

Feel useful , Please give me a compliment

原网站

版权声明
本文为[3D vision workshop]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/208/202207271815166573.html

当前位置：网站首页>Slim: self supervised point cloud scene flow and motion estimation (iccv 2021)

Slim: self supervised point cloud scene flow and motion estimation (iccv 2021)

边栏推荐

猜你喜欢

随机推荐