当前位置：网站首页>Reading classic literature -- Suma++

Reading classic literature -- Suma++

2022-07-02 06:05:00 【Hermit_ Rabbit】

0. brief introduction

As a technology blogger , The most important thing is to constantly learn new knowledge , The best way to learn is to keep reading new articles , And constantly learn and summarize the ideas and methods of predecessors . So bloggers plan to open a new series to introduce . Here is to introduce 《SuMa++: Efficient LiDAR-based Semantic SLAM》 This paper . The link of the original paper is ：https://www.ipb.uni-bonn.de/wp-content/papercite-data/pdf/chen2019iros.pdf. The main frame below is Ren Qian Doctor Blog The content of , Joined the author's personal learning and understanding .

SUMA++

1. Article contribution

Reliable high-precision positioning and mapping are the key links of auto drive system . In addition to high-precision geometric information , The map should also contain semantic information , Provide a basis for the intelligent behavior of the carrier . But in the real world , The existence of moving objects will make the mapping process more complex , Because it will pollute the map and affect the positioning effect . In this article , We are in tradition Drawing based on surface （surfelbased mapping） Based on the method , increase Fusion of semantic information To solve the above problems . Semantic information is extracted through neural network , The network gives category labels to all points in the point cloud , Thus, when we use surface to build drawings , The result is a surface with labels . In this way , It can not only filter out dynamic objects , Moreover, semantic information can be used to constrain mileage estimation , To mention the accuracy of the map .

In the article, the author mentioned that the main two contributions are

Semantic segmentation of point cloud , Recognize dynamic objects according to semantic information , And remove from the map .
Data association of objects with semantic labels , Establish constraint relationship with geometric information , So as to improve the accuracy of drawing .

2. The whole idea

The following figure for SuMa++ Overall network structure , You can see that the main process is divided into the following four parts

Semantic segmentation of point cloud through network
Fill with overflow （flood-fill） Method to eliminate the wrong category label
Use filters for dynamic object detection , And remove dynamic objects
Set up with semantic information constraints ICP Model , Optimize mileage accuracy

Besides SuMa++ Cannot do without Surfel Map ,Surfel There are some predefined concepts in the map ：

Depth map （ $V_D$ ）：t Point cloud at all times P Spherical projection of , Get depth image （ Point cloud raw data limited by polar coordinate distance ）.
Normal vector map （ $N_D$ ）: According to the above $V_D$ The normal vector projection of the spherical projection obtained , Radar polar coordinate system projection similar to the picture .
Map perspective （ $V_M$ && $N_M$ ）: Project the above two steps onto the map .（ features ICP Update pose increment , Accumulate pose increments to get the current pose ）
Bin quilt ： Maps are represented using bin , From one position , A normal vector and a radius , Two time stamps （ Creation time and last update time ）
Stable logarithmic probability $l_s$ ： A binary Bayesian filter is maintained to determine whether a bin is considered stable or unstable .
Semantic map （ $S_D$ ）：RangeNet++ Semantic segmentation of the spherical projection of each needle generates a range picture , Number point by point from the perspective of the sensor .（ Semantic segmentation ）

2.1 Point cloud semantic segmentation

Semantic segmentation uses RangeNet++ Method , Its main content is in another paper , This paper does not make much improvement on this method .RangeNet++ With Darknet53 Backbone As a basis, it is improved to RangeNet53, So as to realize the segmentation of depth image , And re map to the point cloud . For details, please refer to This article .
Insert picture description here

2.2 Fill with water

Flooding filling is a common method in image processing , It is used here to correct the error classification , Avoid the bad influence of classification errors on subsequent links , The left side in the figure below is the part of the dotted box method in the figure on the right , First pair (a) Eliminate the error identification results in , obtain (b), Then use the surrounding tag point cloud to fill it , obtain . Last (d) The corresponding depth map is shown in .
Insert picture description here
The following is the pseudo code of flooding , Inside the pretreatment , Through the original semantic mask $S_{raw}$ , And the corresponding fixed-point map $V_D$ ,（ stay $S_{raw}$ The value of each pixel in is a semantic number , The corresponding pixel in the fixed-point map contains the nearest 3D Point in the radar coordinate system 3d coordinate .） First remove in scope d There is at least one pixel with different semantic number in the neighborhood , Combine this mask with fixed-point depth information , Set the blank boundary pixel to the neighbor labeled pixel , If the distance between the corresponding points is the same , That is, less than a certain threshold , I.e. filling .
Insert picture description here

2.3 Filter out moving objects

The recognition of dynamic objects is based on the probability of objects appearing in the same position . In particular , That is, if in this frame , An object appears at a certain position $S_D$ , At this time, the position of the current frame point cloud $S_D$ Map to the location in the map $S_M$ in , If in the next frame , Object object $S_{D_1}$ Still on the map $S_M$ It's about , And it appears in the same position for many consecutive frames $S_M$ , Then it is static , conversely , The position detected in each frame changes , Then it is mobile .

Here is a stable and important concept , That is to say Surfel Map ,Surfel Map innovation uses the bin formed by the points in the map as the object , Take map consistency as an index to optimize . This paper proposes to use semantic segmentation to provide label To deal with moving objects , That is, by comparing the new observation positions $S_D$ And what already exists in the map $S_M$ Semantic consistency of , When we update the map , If label It's inconsistent , We assume that those facets belong to moving objects . At this time, we add a penalty $o d d s$ Logarithmic probability to stability $l_s$ in . After some observations , We can remove unstable bin . The following is the penalty function , among $odds(p)= \log (p (1-p)^ {-1} )$ , and $p_{stable}$ and $p_{prior}$ It is a stable bin and gives consistent measurements and a priori probabilities respectively . $e x p$ Among them is the amount of compensation noise , $a$ Is the bin normal vector $n_s$ And the angle between the measurement normal vector , $d$ Is the distance between the normal vector and the bin . The measurement normal vector is taken from $N_D$ Normal vector map .
Insert picture description here