当前位置：网站首页>Improved pillar with fine grained feature for 3D object detection paper notes

Improved pillar with fine grained feature for 3D object detection paper notes

2022-07-29 07:03:00 【byzy】

Link to the original text ：https://arxiv.org/pdf/2110.06049.pdf

introduction

current 3D According to the expression of point cloud, the detection methods are mainly divided into point based 、 Voxel based and 2D Gridded . The point based method can extract the most fine-grained features , But it takes a long time ; Voxel based method due to sparse convolution , Time consuming and unstable ; be based on 2D Gridded （ Such as BEV or RV） The fastest , But the projection may lose information , Therefore, the effect may not be as good as the first two .

In this paper PointPillars On the basis of , Introduce height sensing sub cylinder （HS Cylinder ）, Use highly aware location coding to get fine-grained features in the vertical direction ; Introduce a small cylinder based on sparsity （ST Cylinder ）, Use sparsity based CNN The trunk （ Sparse attention by dense features /DFSA Stacked modules ） Get fine-grained features in the horizontal direction .

Method

As shown in the figure below , It consists of three parts . First, the point cloud is projected into small cylinders and sub cylinders , Get fine-grained 2D Pseudoimage . Then use include DFSA Of CNN Trunk feature extraction , The large-scale feature map contains the position information of the object , The small-scale feature map contains the shape information of the object . Last , The feature is input to the detection head to predict the size and position of the bounding box .

High perception sub cylinder

Sub cylinder ： Divide each column into N_h Sub cylinder , Use the center of each point of the sub cylinder (x_c,y_c,z_c) And with the center of the sub cylinder (x,y,z) The migration (x_p,y_p,z_p) Strengthen each point , Then use two layers VFE As column feature code （PFE）, Extract features from each sub cylinder . Then the features of all sub cylinders are spliced as 2D The feature of the corresponding position in the pseudo image .

Due to the concentration of high distribution , Dividing into sub cylinders will only bring small calculation time increments .

Highly aware location coding ： Directly splicing each sub cylinder feature will lose the height information of the sub cylinder . Introduce height position code

$P(z)=\{\sin(2^i\pi z),\cos(2^i\pi z)|_{i=0}^{L-1}\},z=(z_c,z_p)$

And it is spliced with the characteristics of each sub cylinder , As 2D Characteristics of corresponding positions of pseudo images .

Small cylinder based on sparsity

Small cylinder ： take 2D Halve the mesh size , Get finer grained features .

Based on sparsity CNN The trunk ： Direct reduction 2D Grid size brings serious time-consuming increase , And the receptive field decreases .

In this paper, based on sparsity CNN The trunk （SCB）, Sparse attention module by dense features （DFSA） Stack up . Because most small cylinders are empty , Use it directly CNN Is unnecessary and inefficient ; Sparse large-scale features can be used to express the distribution of objects , To predict the center of the object more accurately , At the same time, dense small-scale features are used to extract fine-grained object features , Predict more accurate object boundaries .

DFSA The modules are as follows ：

The input sparse large-scale feature passes through the convolution block with step size , Then, average pooling and maximum pooling are carried out along the channel dimension and spliced . Then input to the convolution layer +sigmoid function , Generate a spatial attention map . meanwhile Characteristics of underground sampling at different scales of branches , And through several convolution blocks , Get dense small-scale feature map . The smaller the scale of the feature map , The more volumes or blocks are used . Dense small-scale feature map is guided by spatial attention map , Upsample to output size . Finally, the characteristic graphs of all branches are spliced , adopt $1\times 1$ Convolution block .

SCB The output of is all DFSA The result of splicing after sampling the input size on the module output .

experiment

Implementation details

Detection head ： And CenterPoint similar , Use the central heat map header and regression header （ The center position is refined 、 Height above ground 、3D Size 、 Yaw angle 、 With the real bounding box IoU）. Use during training focal Loss , Supervised by the center of the real object ; When inferring, find the output position of the dense regression head corresponding to the peak of the heat map and use IoU Perceived confidence correction .

Melting research

The impact of major contributions ： Sub cylinder 、 Location code 、 Small cylinder 、DF Branches and SA Branches can improve the results . The detection accuracy of small objects has been greatly improved .

The influence of the number of sub columns ： The detection accuracy increases with the number of sub cylinders N_h Increase and increase , But to a certain extent , Because the points of each sub cylinder are reduced , Feature extraction becomes difficult , The detection accuracy of automobile categories has decreased .

DFSA Influence of module settings ： The experimental results are right DFSA The hyperparameters in the module are more robust . The more convolution blocks there are , The bigger the feeling field , Improved performance ; The higher the degree of down sampling , The faster the speed. , But the performance has declined .

原网站

版权声明
本文为[byzy]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/210/202207290554076228.html