当前位置：网站首页>Polar Parametrization for Vision-based Surround-View 3D Detection Paper Notes

Polar Parametrization for Vision-based Surround-View 3D Detection Paper Notes

2022-08-02 06:32:00 【byzy】

1 引言

Currently there are two main types of object location parameterized method,Based on parametric and Descartes parametric image.

Based on image parameterized（左图）：Estimated object pixels in the image index and depth (u,v,d) ,To use the camera's internal and external joining the coordinates are transferred to the3D空间.Usually for monocular image.To look around the image,The method independently in each perspective image regression boundary box position,And then the projection to the public3D空间.Finally using cross view post-processing such asNMSFilter out duplicate detection.

The disadvantage is that depth error estimate,In adjacent view and view method overlap to provide additional information is not used;Across the view post-processing method is difficult and unstable.

Descartes parameterized（中图）：Usually detection range for rectangular.Combined with the correlation of multiple view,Combination forecast object3D坐标.

But this method there is a problem,如下图所示：Set the object $A_{t_1}$ 和 $A_{t_2}$ In the different images in the same position,And have the same image mode.

（1）Because of the detection range for rectangular（That is only testing within the scope of the object will be marked）,Training only consider $A_{t_1}$ ,而 $A_{t_2}$ 被丢弃（The two views are not same）,The convergence of the network have an adverse effect.

（2）This method ignores the view symmetry.Produced by the above two images,If use parameterized based on image,Learn the model only needs to predict the location of the same (u,v,d) ;And using a model to predict different cartesian parameters of chemical3D坐标,Will no doubt increase the complexity of the model,And the optimization model is more difficult.

In this paper, ring view3D检测transformer（PolarDETR）,Using cylindrical coordinates $(r,\alpha,z)$ （The radial distance、Horizontal Angle and height）参数化（Is called a parametric;右图）物体位置,And speed parameters into the object of radial velocity and tangential velocity.此外,检测范围、Loss function are defined under polar coordinate.

PolarDETRCan achieve center-Context features polymerization,Enhance the information interaction between the target query and image,In pixels and ray as position encoding,Provide three dimensional space prior,Help predict azimuth $\alpha$ .本文的PolarDETRAchieved good performance-速度平衡.

3 PolarDETR

3.1 概述

如下图所示.A different view of the first image input to the SharedCNN提取特征,Target query is then used to detect objects.Each target query coding the semantic characteristics of the corresponding object and location information,And then a series of decoding layer from around a figure in polymerization characteristics,Iteratively update the target query.前馈网络（FFN）基于这些查询,预测类别,And bounding box and speed of polar code.

3.2 A parameterized

Each boundary box parameter is polar code for9元组 $B_{\textup{enc}}=(b_r,b_{\sin\alpha},b_{\cos\alpha},b_z,b_l,b_w,b_h,b_{\sin\theta},b_{\cos\theta})$ ,Can be estimated according to its boundary box polar parameters $B_{\textup{pred}}=(r,\sin\alpha,\cos\alpha,z,l,w,h,\sin\theta,\cos\theta)$ .其中

$r=\sigma(b_r)\cdot R_{\max},z=\sigma(b_z)\cdot(Z_{\max}-Z_{\min})+Z_{\min}$

$\sin\alpha=\frac{b_{\sin\alpha}}{\sqrt{b_{\sin\alpha}^2+b_{\cos\alpha}^2}}, \cos\alpha=\frac{b_{\cos\alpha}}{\sqrt{b_{\sin\alpha}^2+b_{\cos\alpha}^2}}$

$l=\exp(b_l),w=\exp(b_w),h=\exp(b_h)$

$\sin\theta=\frac{b_{\sin\theta}}{\sqrt{b_{\sin\theta}^2+b_{\cos\theta}^2}}, \cos\theta=\frac{b_{\cos\theta}}{\sqrt{b_{\sin\theta}^2+b_{\cos\theta}^2}},$

$Z_{\max}$ 和 $Z_{\min}$ For highly detection range, $R_{\max}$ For maximum detection range; $\sigma$ 是sigmoid函数.The return of the horizontal Angle and heading Angle cosine of positive for $(\sin(\cdot),\cos(\cdot))$ ,To ensure the continuity of regression space.

Location estimation of polar decomposition：A parametric object position decoupling of the radial distance and horizontal Angle.距离Associated with the object size,Can learn from image mode;水平角 $\alpha$ That is associated with pixel index,Can learn from location coding.

Polar decomposition speed estimation：The radial velocity associated with object size rate of change,The movement of the object in the image plane and tangential velocity associated.

A parameterized explicitly set up image schema and prediction target association,The explicit fragments-in detector can have better convergence and performance.

3.3 解码层

Iteratively decoding layer convergence and update queries.The first to use a long since attention module（MHSA）To query the information interaction between,Then use the linear layer from the query to extract the object position：

$(b_r,b_{\sin\alpha},b_{\cos\alpha},b_z)=\textup{Linear}(\textup{MHSA}(q_i))$

转换为3D坐标 $c_i^{\textup{3D}}=(r,\alpha,z)$ 即可.

中心-Context features polymerization：Polymerization ring view characteristic figure characteristics.先将3DCenter of the projection to the image plane,得到2D中心点：

$c_i^{k}=\textbf{K}^k\cdot \textbf{Rt}^k\cdot c_i^{\textup{3D}}$

其中 $\textbf{K}^k$ 和 $\textbf{Rt}^k$ Respectively by the firstA camera the projection matrix derived from the inside and outside.Using bilinear interpolation obtained from image characteristics in the center of the characteristics of（如果2DCenter position outside the range image,The feature set to0）.

Introduction of context features enhanced query and ring view interaction to promote localization.Based on the center features $f_{c_i^k}$ And the query embedded q_i Forecast and center offset,A collection of generated Wen Dian up and down $\{p_i^k\}_{k=1}^K$ ：

$\Delta u_i^k,\Delta v_i^k=\textup{Linear}(\textup{Conact}(f_{c_i^k},q_i)),p_i^k=c_i^k+(\Delta u_i^k,\Delta v_i^k)$

Finally using bilinear interpolation to get up and down Wen Dian characteristics.

像素射线：如下图所示,Pixel rays from optical center through pixel arrive3D点,Directly establish the relationship between the pixels and points,Contains a horizontal Angle of explicit information.

This article USES the pixel ray for the location of the additional code,For each center or Wen Dian up and down,Pixels ray direction vector $d_{\textup{ray}}$ As an additional feature dimension and the original characteristics of joining together.

查询更新：

$\hat{q}_i=\textup{MLP}(\textup{Concat}(\{f_{c_i^1},\cdots,f_{c_i^K},f_{p_i^1},\cdots,f_{p_i^K}\},d_{\textup{ray}}))+q_i$

The updated query embedded coding more accurate location information,So as to make the better in the next decoding layer characteristics of polymerization.

3.4 感知范围、标签分配和损失函数

感知范围：Since the car centered round area.

标签分配：First converts mark labels to polar： $B_{\textup{gt}}=(\bar{r},\sin\bar{\alpha},\cos\bar{\alpha},\bar{z},\bar{l},\bar{w},\bar{h},\sin\bar{\theta},\cos\bar{\theta})$ ,Then use the bidirectional matching method for the real boundary box only forecasts.By matching the price is as follows：

$C(i,j)=C_{\textup{cls}}(i,j)+C_{\textup{box}}(i,j)$

$C_{\textup{box}}(i,j)=|r-\bar{r}|+k_{\textup{scaling}}\cdot(|\sin\alpha-\sin\bar{\alpha}|+|\cos\alpha-\cos\bar{\alpha}|)$

其中 $C_{\textup{cls}}(i,j)$ 是DETRDefined in the category of.

Calculation of each pair of prediction and matching the price after get the price of the real boundary box matrix $\textbf{H}$ ,Then use the Hungarian algorithm to find the optimal allocation.

损失函数：The bidirectional matching loss by classification loss（focal损失）And polar boundary box/速度损失（L1损失）组成.

3.5 时序信息

将PolarDETR扩展为PolarDETR-TTo accept the input of the sequential images.The object of the current frame center $c_i^{\textup{3D}}$ Is projected to before the images to obtain,以第 t-n 帧为例：

$c_i^{k(t-n)}=\textbf{K}^k\cdot \textbf{Rt}^k\cdot \textbf{Pose}^{(t-n)}\cdot c_i^{\textup{3D}}$

其中 $\textbf{Pose}^{(t-n)}$ For the attitude transformation matrix,Response from the car from the first t-n 到The frame posture change.Similar to the way,From the previous frame sampling center and context characteristics.All sampling characteristics were finally polymerization,Used to update the query embedded.

For the purpose of efficient inference,Figure of image features can be cached in the past,So only need to deal with the current frame image,从而PolarDETR-TThe inference speed close toPolarDETR.

4 实验

4.2 实验设置

Use test tracking algorithm will bePolarDETR扩展为3D目标跟踪,According to the current frame rate,Will object to a frame on the,Then the closest matching method is used to match the target.

4.4 主要结果

PolarDETR-T的性能比PolarDETR要高,Especially for the speed estimation on.

4.5 消融研究

关键组件：A parameterized、Wen Dian and pixel ray up and down all the performance improved,And computational cost can be ignored.

The speed of the polar decomposition：Compared with the cartesian decomposition,Polar decomposition can improve the estimation precision of speed.

Up and down Wen Dian：Performance is stronger with the increase of the number of fluctuation Wen Dian,But after a certain range increase has a negative effect.Used to generate the upper and lower Wen Dian query embedded and center features are helpful to performance improvement are.

解码层：The better the performance of decoding the layer number of the more,But tend to saturation.

原网站

版权声明
本文为[byzy]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/214/202208020509022782.html