当前位置：网站首页>Point Density-Aware Voxels for LiDAR 3D Object Detection Paper Notes

Point Density-Aware Voxels for LiDAR 3D Object Detection Paper Notes

2022-08-02 06:32:00 【byzy】

1 引言

The laser radar is a problem of point cloud with the change of distance far and become thin.

The method based on voxel ignore a little bit of the density of the,Use the voxel center features（左图）;But for a wide range of input,Memory limit the voxel resolution,Point with an alignment problem not result in loss of voxel object details,从而导致性能下降.

Based on the method USES the apogee sampling point（中图）,But the computational complexity increases with the number of sampling points,Limit the number of the elaboration phase sampling point.

In addition because the target surface area of the small such as pedestrians or bicycles,For the laser radar positioning difficult,The present methods are focused mainly on the detection of vehicle class.

This paper put forward the point density perception voxel network（PDV）Using spot centroid localization voxel and consider point density feature coding,解决上述问题.

Black dots for the original lidar points

主要贡献：

（1）Spot centroid localization voxel：For each non-null voxel calculation of center of mass point（右图）;Through the elaboration using peer centroid to locate the voxel characteristics,PDVUsing point density distribution in the feature coding retain fine-grained location information.

（2）Density of perceptionRoIThe grid pooling：在RoIGrid when pooling code as an additional feature point density.First using kernel density estimation（KDE）At each grid point inside the spherical neighborhood coding characteristics of local voxel density,Then use a point density encoding since attention.The method to capture the local point density in the regional proposal information,For the second phase refinement.

（3）Density confidence predict：Using barycentric position of the bounding box and the boundary points in the frame as an additional feature to refine the bounding box confidence predict.To using the lidar point density and distance have more according to the confidence of the inherent relationship between the predict.

3 方法

如下图所示,PDVUsing two phase detection network,第一阶段使用3DSparse convolution backbone generate proposal,Second stage using voxel characteristics of each individual element layer and the original point cloud data to refine.

3.1 3DVoxel trunk

类似SECOND,Namely voxel after use3D稀疏卷积,再用RPN生成提案.Each individual element layer, in turn, increases the sampling resolution,And all can be used in the second phase of refining.

3.2 Spot centroid localization voxel

The module positioning on the space space voxel characteristics,For the density of perceptionRoIThe grid pooling polymerization.

For a certain individual element,All of these point coordinates to calculate the mean,Or get the point of center of mass of the voxel.Use hash map the mass center of each individual element points to the corresponding eigenvectors.Voxel point mass center and sparse voxel characteristics are associated with the same individual element index.

Using convolution nuclear size、步长、填充值,The next layer of voxel point of center of mass can be a layer to calculate the result of the（即加权求和）.这样可以避免重复计算,So that the method can effectively extend to the larger point cloud.

3.3 Density of perceptionRoIThe grid pooling

使用KDEAnd since the combination of attention for each proposal coding point density characteristics.Within each proposal first sampling $U\times U\times U$ 的网格点.

Local characteristics of density

使用KDEEstimated that each grid point density of the local characteristics of spherical neighborhood in.Density of perceptionRoICoding for pooling will estimate the probability density of additional features.

首先,Press type for each grid point（ g_j ）Spherical neighborhood（ N(g_j) ）Centroid feature voxel point in：

$\Psi_{g_j}^l=\left \{ \begin{bmatrix} f_{V_k^l}\\ c_{V_k^l}-g_j\\ p(c_{V_k^l}|g_j) \end{bmatrix}^T,\forall c_{V^l_k}\in N(g_j) \right \}$

其中 $f_{V_k^l}$ 是第层第A non-empty voxel characteristics, $c_{V_k^l}$ Is the barycentric coordinates. $p(c_{V_k^l}|p_j)$ 是KDEEstimate the probability density of（似然值）：

$p(c_{V_k^l}|g_j)\approx \frac{1}{|N(g_j)|\sigma^3}\sum_{c_{V^l_i}\in N(g_j)}W(c_{V_k^l},c_{V_i^l}),\; W(c_{V_k^l},c_{V_i^l})=\prod_{d\in\{x,y,z\}}w\left ( \frac{c_{V_k^l,d}-c_{V_i^l,d}}{\sigma} \right )$

其中 $\sigma$ 为带宽,为在 xyz Coordinates are the independent nuclear（This article USES the gaussian kernel）.

得到特征 $\Psi_{g_j}^l$ 后,使用PointNet多尺度分组（MSG）Module from each grid point g_j 获取特征向量：

$f_{g_j}^l=\textup{maxpool}(\textup{FFN}(\Psi_{g_j}^l))$

MSGUsing multiple radius（Is the radius of spherical neighborhood）For each grid point capture multi-scale feature density,The output pieced together.

The final feature is characteristic of all layers together：

$f_{g_j}=[f_{g_j}^1,\cdots,f_{g_j}^L]$

Grid point since the attention

The characteristics of the different grid point no relationship,Can use the attention grabbing grid point distance dependence.如下图所示,The masterpiece of note for the empty grid point feature $f_{G^b}=\{f_{g_i}\mid |N(g_i)|>0,\forall g_i\in G^b\}$ ,使用标准transformer的编码器 $T_{g_i}$ 和残差连接,即： $\tilde{f}_{g_i}=T_{g_i}(f_{G^b})+f_{g_i}$ .

对于 |N(g_i)|=0 的网格点,Don't enter the attention module,特征不变.

Green point for grid point,Blue point to the original point

Point density encoding

Only add attention module lack of lidar point cloud of geometric information,So consider the location of the point cloud density encoding.The code USES local grid position and proposals within the original point,Proposal is divided into $U\times U\times U$ 的体素（Each individual element corresponding to a grid point）,The characteristics of each grid location coding for：

$\textup{PE}(f_{g_j})=\textup{FFN}([\delta_{g_j},\log(|N(V_{g_j})|+\epsilon)])$

其中 $\delta_{g_j}=x_{g_j}-c_b$ 是 g_j Location and bounding box center c_b 的相对位置, $|N(V_{g_j})|$ 是以 g_j As the center of the voxel point number, $\epsilon$ For the constant bias.这样,RoIGrid pooling can capture area proposal midpoint density.

3.4 Density confidence predict

Use of the lidar points on the object distance and to predict the confidence of bounding box.

First the density of perceptionRoIOutput characteristics of grid pooling module level,使用共享的FFN编码得到 $f_{\tilde{b}}^s$ ;然后两个FFNBranch respectively for encoding characteristics of bounding box refinement and confidence estimation.

When confidence estimation,Will eventually bounding box center $c_{\tilde{b}}$ And the final boundary points in the frame $|N(\tilde{b})|$ 附加到 $f_{\tilde{b}}^s$ ：

$p_{\tilde{b}}=\textup{FFN}([f_{\tilde{b}}^s,c_{\tilde{b}},\log(|N(\tilde{b})|)])$

3.5 训练损失

Using regional proposal loss $L_{\textup{RPN}}$ And bill refining losses $L_{\textup{RCNN}}$ 联合训练.

$L_{\textup{RPN}}=L_{\textup{cls}}(y_b,y_b^\star )+\beta L_{\textup{reg}}(r_b,r_b^\star)$

其中 $L_{\textup{cls}}$ 为focal损失, y_b To predict the category probability vector, $y_b^\star$ For real category; $L_{\textup{reg}}$ 为SmoothL1损失, r_b 为预测RoIAnchor box residual, $r_b^\star$ Is true of anchor box residual.

$L_{\textup{RCNN}}=L_{\textup{IoU}}+L_{\textup{reg}}(r_{\tilde{b}},r_{\tilde{b}}^\star)$

$L_{\textup{IoU}}=-p_{\tilde{b}}^\star\log(p_{\tilde{b}})-(1-p_{\tilde{b}}^\star)\log(1-p_{\tilde{b}})$

其中 $p_{\tilde{b}}^\star$ 是由3D RoIAnd its associated confidence level of the real boundary box zoom training goal（见PV-RCNN）; $L_{\textup{reg}}$ 为SmoothL1损失, $r_{\tilde{b}}$ 和 $r_{\tilde{b}}^\star$ Are predicting boundary box and the real boundary box residual.

4 实验

使用X/Y轴翻转、全局缩放、全局旋转,And copy and paste augmented method.

后处理时,Using the maximum inhibition to remove redundant boundary box.

实验结果：PDVCan capture of voxel missing detailed information,Implementation of the second stage of the precise refinement.

If the voxel grid has higher voxel resolution,PDVMethod of ascension may be limited.

4.3 消融研究

组件

Using the voxel point centroid localization features than using voxel center positioning performance better,Especially for small objects,Because the voxel center may is not aligned with point cloud.Voxel centroid localization makes features a bit closer to the surface,Provide more meaningful for proposal detailed geometry information.

使用KDECapture the characteristics of the relationship between density can also help,Especially for deformable target such as pedestrian.

Use attention mechanism inRoIGrid point distance dependence between,To detect pedestrians and cyclists have better performance.

Using density confidence level prediction method can further improve the detection accuracy and bikes.

Point density encoding

Using sine encoding can improve the detection precision of the pedestrians and cyclists（But for auto detection accuracy down）;Use only the grid point coordinates asFFN的输入,With minimal performance;Use only the density characteristics also have similar performance improvement;Combination of both can achieve better performance.

4.4 运行时间分析

运行时间比PV-RCNNSlightly faster（性能也更好）.But this method still can not meet the real-time requirements.

4.5 Under different distanceFP数

随着距离增大,FP数增大.But on the whole thanPV-RCNN的FP少,And the greater the distance,差距越大.Refinement bounding box may be using a point density and degree of confidence is helpful to detect objects at a distance.

5 结论

In this paper, methods for large input range by its,Because of the point cloud sampling are expensive and low resolution voxel,And the method can effectively deal with these two problems.

附录

B.局限性

体素分辨率：分辨率越高,The smaller the performance.Because the voxel center and the point distance to center of mass of the smaller,And each is not empty voxels containing points close to1,The density of a little not empty voxel approximation.

泛化性：The method in the second stage depends on the density of points.If the test point on the distribution of training are very different（Such as extreme weather from the point of the object will be a lot less）,可能导致严重的性能下降.

E.点密度-具体图像

Map image can be seen that,By using the distance-Point density relationship,PDVEffectively reduce the outside of the training sample distribution under different distanceFP数.

原网站

版权声明
本文为[byzy]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/214/202208020509023043.html