当前位置：网站首页>Automatic mapping of tailored landmark representations for automated driving and map learning

Automatic mapping of tailored landmark representations for automated driving and map learning

2022-06-10 12:33:00 【Earth moving big white dog】

ICRA 2021 Thesis link
Source code nothing

1 Speed reading

1.1 What problem does the paper try to solve ？ Is this a new question ？

High dimensional semantic information of high-precision map （mask） Auto build problem for ; It's a relatively new problem , Not widely resolved , The problem lies in the contradiction between high dimensional representation and high cost ;
It is used for automatic construction of traffic semantic information , From manual operation to automatic program construction + Manual correction of false detection ;

1.2 What are the relevant studies ？ How to classify ？ Who are the researchers who deserve attention in this field ？

Construction methods of various semantic goals ;object SLAM In means ;

1.3 What is the contribution of the article ？

Using the depth and orientation given by the radar + Semantic information , It solves the detection of map semantic elements in monocular situation + track + The whole process of drawing construction ;
For traffic signs + traffic lights + Pole modeling , Positioning accuracy <10cm, Size accuracy <5cm, toward MAE<6°;
① Modeling semantics in a simplified way ;
② Semantic information + Radar data recovers the depth and orientation of semantic targets from a single frame ;
③data association Method ;
④ Multiple perspectives landmark Estimation method ;

1.4 What is the key to the article solution

1.5 How to design the experiment ？ Are the experimental results sufficient to demonstrate its effect ？

Nothing suitable benchmark, Cannot be generalized to KITTI Data sets , So I measured three challenging data with my car ;
Acquisition vehicle configuration ：4096 × 1536 pixels that is triggered with 10 Hz when the Velodyne VLS-128 Alpha Prime lidar;
Single Measurement Precision： Single frame measurement accuracy , It should only use radar information + One frame information recovery , In order to show that the accuracy of single frame measurement is very high and meet the use requirements ;
Map Optimization Results：

1.6 What is a dataset ？

1.7 What problems will there be

author conclution：
1. Optimize semantic information and pose together
I think ：
2. What about other road information ？
3. Nothing suitable benchmark

2 primary coverage

Please add a picture description

2.1 System framework

The whole idea is a bit like SfM, Restore the road signs in the scene through point cloud stream and video stream , street lamp , Road poles and other information ;
① Preprocessing to get the required information ;
② Filter out bad data and use radar measurements for depth estimation ;
③ Restore the parameterized representation of semantics ;
④ Perform inter frame correlation , And optimize map information ;

2.2 A parameterized

2.2.1 Preprocessing

The semantic web gets mask and bounding box+ Vision SLAM The system obtains accurate position and posture ;
obtain ： Semantic measurement $\mathcal{D}_k$ , Category label $c_m$ ,bounding box Upper left and lower right image coordinates $d_{TL/BR}^m$ , Radar point cloud $\mathcal{L}_k$ , The pose of each frame $T_k$ ;

2.2.2 pre-filtering

① Filter out duplicate detected landmarks,bounding box Of IoU>10% It's repetition （ This threshold is too low ？）;
②bounding box Need to be good enough ,mask occupy bounding box The proportion of >30%;

2.2.3 Depth estimation

The radar point cloud is projected onto the pixel plane and combined mask Get the depth ;
Because the installation positions of radar and camera are not exactly the same , So the two observations will not be exactly the same , As shown in the figure below ： The road signs have point cloud information from the cars behind , It is because the vehicles behind can be detected around the road signs at the radar position , But the projection to the pixel plane is considered to be the depth of the road sign ;

Insert picture description here
The solution is to use DBSCAN algorithm Cluster the radar point cloud , In projection, only the data of the nearest cluster point cloud is concerned ;

2.2.4 parameter

Road pole 、 street lamp ： The upright part is represented by a cylinder ： Location xyz, wide w, high h;
sign ： Show... In a box ： Location xyz, wide w, high h, One more. z The angle of orientation $\varphi$ ;
Insert picture description here
① Solve the point cloud center $x_{\mathcal{L}}$ ：
$x_{\mathcal{L},\theta}=\underset{x_{\mathcal{L},\theta}}{argmin}\sum_{l_i\in \mathcal{L}_k^m}\rho(||l_{i,\theta}-x_{\mathcal{L},\theta}||^2)$ ② Center the point cloud $x_{\mathcal{L}}$ Projection to the observation direction $d_C$ Get the real center $x$ （ This idea is important for building object Very helpful ！！）：
This is because ：
First , The observed point cloud must be a side of a real object , Unable to correctly describe the spatial position of an object with a contour ;
secondly , The real center of the object is likely to be 2d bounding box In the projection direction of the center of ;
In this way, the projection direction + The two conditions of point cloud center roughly determine the center of the object , Here's the picture ：
Insert picture description here
The plane in the figure indicates that it is perpendicular to the ground , too $x_{\mathcal{L}}$ The plane of the , The normal vector can be determined by the projection of the observation direction ：
$d_G = \frac{1}{\sqrt{d_{C,1}^2+d_{C,2}^2}} \begin{bmatrix}d_{C,1} \\ d_{C,2} \\ 0 \end{bmatrix}$
The center can be determined from the normal vector $x$ ：
$\frac{x_{ {\mathcal{L}}^T}\cdot n}{ {d_C}^T\cdot n}d_C = \frac{x_{ {\mathcal{L}}^T}\cdot d_G}{ {d_C}^T\cdot d_G}d_C$
③ Length and width wh determine , according to bounding box The intersection coordinates of the plane are determined ;
Traffic lights and road poles ：
$x_{TL/BR} = \frac{x^T\cdot d_G}{d_{TL/BR}^T\cdot d_G} = d_{TL/BR}$
sign ： among $n_{\mathcal{L}}$ Is a plane vector fitted by a point cloud
$x_{TL/BR} = \frac{x^T\cdot n_{\mathcal{L}}}{d_{TL/BR}^T\cdot n_{\mathcal{L}}} = d_{TL/BR}$
The sign also has an additional orientation angle , Calculate by the ground normal vector ：
$\varphi = arccos(-d_D^T\cdot n_{\mathcal{L}})$
obtain bounding box The coordinates in space can be used to calculate wh;

2.2.5 bounding box measurement

Far away object Because the radar measurement points are sparse , Yes bounding box Less than 5 A radar projection , Only this bounding box Go to the optimization center $x$ ;
2.3 Data Association
Because radar performs better at close quarters , So choose the reverse order of time （ Reverse drive ）, The matching strategy adopts Hungarian algorithm ;
2.4 Map optimization
From all recovered landmark The most common results are calculated in , The practice is to optimize the map every time there is a new key （ For better matching ）
The so-called map optimization is to calculate the one with the least cost from all the results ： Parameter is $\theta\in\{x,y,z,w,,h,\varphi\}$ ;
$\hat{\ell}_{i, \theta} = \underset{\ell_{i,\theta}}{argmin}\sum_{p_i\in\mathcal{A}_{\hat{\ell}_{i}}}\rho(||\ell_{i, \theta}-p_{j,\theta}||^2)$

原网站

版权声明
本文为[Earth moving big white dog]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/161/202206101227544308.html