当前位置:网站首页>Slim: self supervised point cloud scene flow and motion estimation (iccv 2021)
Slim: self supervised point cloud scene flow and motion estimation (iccv 2021)
2022-07-27 20:53:00 【3D vision workshop】
Click on the above “3D Visual workshop ”, choice “ Star standard ”
The dry goods arrive at the first time

Author bubble robot
Source Bubble robot SLAM
title :SLIM: Self-Supervised LiDAR Scene Flow and Motion Segmentation
author :Stefan Andreas Baur, David Josef Emmerichs, Frank Moosmann, Peter Pinggera1, Ommer and Andreas Geiger
source :ICCV 2021
compile :cristin
to examine :zh

Abstract

Hello everyone , Today's article for you SLIM: Self-Supervised LiDAR Scene Flow and Motion Segmentation
In recent years , There are several self supervised learning frameworks for 3D scene flow based on point cloud .Sceneflow Inherently, each scene is divided into multiple movements agent And clustering following rigid body motion . However , Existing methods do not take advantage of this feature of data in their self-monitoring training programs , This can improve and stabilize flow prediction . Based on the difference between robust rigid self motion estimation and original flow prediction , We generate a self supervised motion segmentation signal . Predicted motion segmentation , In turn, , Our algorithm is used to focus on stationary points , Motion information gathered in the static part of the scene . We learn our model end-to-end through back-propagation gradient through kabch The algorithm of , And prove that this can improve ego-motion It is estimated that , So as to improve scene flow estimation . In ablation studies , We further analyze the performance gain when combining motion segmentation and scene flow . We also propose a new network architecture of 3D lidar scene flow , So that it can handle an order of magnitude more than the number of points in the previous training .
project_home:https://baurst.github.io/slim/

Main work and contribution

1. Our method is the first scene flow estimation method based on point cloud , It divides points into “ motion ” and “ static ” Two types of .
2. Our method is significantly better than previous methods in scene flow estimation based on point cloud , Especially in the generalization of previously unseen data , We have demonstrated these data on several datasets , Including self-monitoring and full monitoring settings .
3. Our novel network architecture can handle many more points than the current weak or self-monitoring methods .

Method

Three dimensional scene flow estimation uses two continuous input point clouds Pt∈RN×3, Pt+1∈RM×3 Predict a three-dimensional displacement vector for each point in the first point cloud , Represents the motion of each point w.r.t.

chart 1 Overview of our network architecture . Convolution gated loop unit (ConvGRU) According to the forecast flo Find out about it , Iteratively predict flow and log updates
Point Cloud Encoder: utilize [20]( The weights of Shared ) Introduced pillar feature network (PFN) The point cloud will be entered Pt, Pt+1 The codes are respectively BEV Pseudoimage , Get the value of the It, It+1∈RH×W ×C And then handled by the backbone . We use the same for all datasets BEV Range , Cover your car −35m≤x, y≤35m The square of ,x, y Is the horizontal axis . The resolution we use is h =W= 640, This corresponds to an approximate 11cm The size of the column .
Flow Backbone: Our trunk is largely affected by raft[39] Inspired by the , It is used to predict the dense optical flow on the image . Its core component is recursive update blocking and flow prediction of hidden states , Each iteration produces a finer and more accurate flow . therefore , We use the independently encoded input image to construct the Correlation , Use previous traffic forecasts to find relevant values , Thus, the flow is directed to the pixel area with more accurate matching . Even though RAFT It is designed for dense optical flow , But we show that , It is in sparse distribution BEV Domains are also very useful , And it can be well promoted . Compared with conventional images ,BEV Domains are more dispersed 、 Smaller areas and very independent movement patterns ( Mobile traffic participants ) form .
We use RAFT[39] To handle traffic forecasts , And iteratively update two additional logarithms , Pictured 2 Shown . first logit mapping Lcls Used as an output signal , Classify points as static or moving world frames . The accuracy of traffic prediction can vary greatly in a scenario , Because the featureless surface is not suitable for flow estimation . the second logit Lwgt Used to overcome this problem , Allow the network to show its confidence in traffic estimation . The output decoder uses these two logarithms to aggregate and improve the accuracy of static and dynamic scene elements .
Lclsis The processing of is similar to the processing of streams , But confidence weighted tasks are more closely related to flow prediction , Therefore, data flow is coupled in the process of information processing . In addition to this small change , We kept RAFT The general framework of , Including not only zeroing the gradient on the input stream , And the gradient is also zeroed on the input log of each update block .
Output Decoder: First , The output decoder uses these BEV Map the input point cloud pt Annotate each point of , And use a stream vector lookup and two logits Lcls,i,Lwgt, According to the value of its corresponding column element . By doing so , We assume that the behavior of all points in the column is very similar . We believe this is correct for almost all LIDAR point clouds measured outdoors , Because all mobile traffic participants need to occupy some space .
Besides , The beam of most lidar systems is not upward . Please note that , Although our network architecture is dedicated to this way , But our loss framework is suitable for any 3D Scene flow prediction , There's no need to assume 2D flow . In order to regularize and improve traffic prediction on static scenes , The output decoder will be classified as still The points of are aggregated into a single coherent rigid motion range transformation Tr∈R4×4.
We use kabch Algorithm [17], Using singular value decomposition to calculate differentiable Tr value . The weight of each point wi Determines the final result of each flow vector prediction Tr The degree of influence . We first apply confidence logarithm sigmoid Activate , Then mask them based on the classification logarithm . then , We have reunited ownership , Make the sum of 1, To ensure the stability of the value .
Degree of confidence logits Only by calculation Tr Accept gradient updates , So it's end-to-end training , No further supervision is required

Degree of confidence logits Only by calculation Tr Accept gradient updates , So it's end-to-end training , No further supervision is required

chart 2 Yes KITTI-SF Qualitative comparison of scenario methods . according to AccR The accurately estimated flow is blue , Inaccurate predictions are red . From left to right :PointPWCNet (PPWC), PoseFlowNet (PF), Ours

experimental result




For biography


chart 3 Left : Segmentation of real ground motion , Right : Forecast dynamics , Higher probability of movement is brighter
Click on Read the original , You can get a link to download this article .
This article is only for academic sharing , If there is any infringement , Please contact to delete .
3D Visual workshop boutique course official website :3dcver.com
1. Multi sensor data fusion technology for automatic driving field
2. For the field of automatic driving 3D Whole stack learning route of point cloud target detection !( Single mode + Multimodal / data + Code )
3. Thoroughly understand the visual three-dimensional reconstruction : Principle analysis 、 Code explanation 、 Optimization and improvement
4. China's first point cloud processing course for industrial practice
5. laser - Vision -IMU-GPS The fusion SLAM Algorithm sorting and code explanation
6. Thoroughly understand the vision - inertia SLAM: be based on VINS-Fusion The class officially started
7. Thoroughly understand based on LOAM Framework of the 3D laser SLAM: Source code analysis to algorithm optimization
8. Thorough analysis of indoor 、 Outdoor laser SLAM Key algorithm principle 、 Code and actual combat (cartographer+LOAM +LIO-SAM)
10. Monocular depth estimation method : Algorithm sorting and code implementation
11. Deployment of deep learning model in autopilot
12. Camera model and calibration ( Monocular + Binocular + fisheye )
13. blockbuster ! Four rotor aircraft : Algorithm and practice
14.ROS2 From entry to mastery : Theory and practice
15. The first one in China 3D Defect detection tutorial : theory 、 Source code and actual combat
16. be based on Open3D Introduction and practical tutorial of point cloud processing
blockbuster !3DCVer- Academic paper writing contribution Communication group Established
Scan the code to add a little assistant wechat , can Apply to join 3D Visual workshop - Academic paper writing and contribution WeChat ac group , The purpose is to communicate with each other 、 Top issue 、SCI、EI And so on .
meanwhile You can also apply to join our subdivided direction communication group , At present, there are mainly 3D Vision 、CV& Deep learning 、SLAM、 Three dimensional reconstruction 、 Point cloud post processing 、 Autopilot 、 Multi-sensor fusion 、CV introduction 、 Three dimensional measurement 、VR/AR、3D Face recognition 、 Medical imaging 、 defect detection 、 Pedestrian recognition 、 Target tracking 、 Visual products landing 、 The visual contest 、 License plate recognition 、 Hardware selection 、 Academic exchange 、 Job exchange 、ORB-SLAM Series source code exchange 、 Depth estimation Wait for wechat group .
Be sure to note : Research direction + School / company + nickname , for example :”3D Vision + Shanghai Jiaotong University + quietly “. Please note... According to the format , Can be quickly passed and invited into the group . Original contribution Please also contact .

▲ Long press and add wechat group or contribute

▲ The official account of long click attention
3D Vision goes from entry to mastery of knowledge : in the light of 3D In the field of vision Video Course cheng ( 3D reconstruction series 、 3D point cloud series 、 Structured light series 、 Hand eye calibration 、 Camera calibration 、 laser / Vision SLAM、 Automatically Driving, etc )、 Summary of knowledge points 、 Introduction advanced learning route 、 newest paper Share 、 Question answer Carry out deep cultivation in five aspects , There are also algorithm engineers from various large factories to provide technical guidance . meanwhile , The planet will be jointly released by well-known enterprises 3D Vision related algorithm development positions and project docking information , Create a set of technology and employment as one of the iron fans gathering area , near 4000 Planet members create better AI The world is making progress together , Knowledge planet portal :
Study 3D Visual core technology , Scan to see the introduction ,3 Unconditional refund within days

There are high quality tutorial materials in the circle 、 Answer questions and solve doubts 、 Help you solve problems efficiently
Feel useful , Please give me a compliment
边栏推荐
- MySQL string function
- RK3399平台入门到精通系列讲解(导读篇)21天学习挑战介绍
- 面了个腾讯拿38K跳槽出来的,见识到了真正的测试天花板
- R语言使用t.test函数执行t检验验证总体均值是否是某个特定的值(从样本集推论总体均值)
- 《SRE:Google运维解密》读后有感
- R语言使用dplyr包左连接两个dataframe数据(left join)
- Flask-MDict搭建在线Mdict词典服务
- EasyCVR平台添加RTSP设备时,出现均以TCP方式连接的现象是什么原因?
- [efficiency] abandon notepad++, this open source substitute is more awesome!
- How to optimize the open source community experience through developer metrics
猜你喜欢

用户组织架构的管理

Introduction to rk3399 platform introduction to proficient series (Introduction) 21 day learning challenge

Go --- automatic recompilation of air

How to improve the picture transmission speed and success rate in the development of IM instant messaging under the mobile network

Introduction to JVs Foundation

说透缓存一致性与内存屏障

To do the test, you have to go to the big factory and disclose the "hidden rules" of bat big factory recruitment internally

People call this software testing engineer. You're just making a living (with HR interview Dictionary)
![[dataset display annotation] VOC file structure + dataset annotation visualization + code implementation](/img/09/645ce4de40d18d8dd4532554826c9a.png)
[dataset display annotation] VOC file structure + dataset annotation visualization + code implementation

推荐一款强大的搜索工具Listary
随机推荐
VI working mode (3 kinds) and mode switching (conversion)
Analysis on the optimization of login request in IM development of instant messaging mobile terminal
关于栈迁移的那些事儿
[efficiency] abandon notepad++, this open source substitute is more awesome!
软件测试面试题:已知一个数字为1,如何输出“0001
[Numpy] 广播机制(Broadcast)
金仓数据库 Oracle至KingbaseES迁移最佳实践(2. 概述)
金仓数据库 KingbaseES异构数据库移植指南 (2. 概述)
在字节干了两年离职后,一口气拿到15家Offer
认识网络模型数据的封装和解封装
Ten year test old bird talk about mobile terminal compatibility test
adb shell ls /system/bin(索引表)
Software test interview question: how to output "0001" when a number is known to be 1
Jetpack compose performance optimization guide - compilation metrics
How to monitor the running status and usage of NVIDIA Jetson
MySQL log error log
[program life] "stage summary" - unwilling to be ordinary
A layered management method of application layer and hardware layer in embedded system
R语言dplyr包summarise_at函数计算dataframe数据中多个数据列(通过向量指定)的计数个数、均值和中位数、使用list函数指定函数列表(使用.符号和~符号指定函数语法purr)
[Numpy] 数组属性