当前位置:网站首页>Polar Parametrization for Vision-based Surround-View 3D Detection Paper Notes
Polar Parametrization for Vision-based Surround-View 3D Detection Paper Notes
2022-08-02 06:32:00 【byzy】
原文链接:https://arxiv.org/abs/2206.10965
1 引言
Currently there are two main types of object location parameterized method,Based on parametric and Descartes parametric image.
Based on image parameterized(左图):Estimated object pixels in the image index and depth,To use the camera's internal and external joining the coordinates are transferred to the3D空间.Usually for monocular image.To look around the image,The method independently in each perspective image regression boundary box position,And then the projection to the public3D空间.Finally using cross view post-processing such asNMSFilter out duplicate detection.
The disadvantage is that depth error estimate,In adjacent view and view method overlap to provide additional information is not used;Across the view post-processing method is difficult and unstable.
Descartes parameterized(中图):Usually detection range for rectangular.Combined with the correlation of multiple view,Combination forecast object3D坐标.
But this method there is a problem,如下图所示:Set the object和
In the different images in the same position,And have the same image mode.
(1)Because of the detection range for rectangular(That is only testing within the scope of the object will be marked),Training only consider,而
被丢弃(The two views are not same),The convergence of the network have an adverse effect.
(2)This method ignores the view symmetry.Produced by the above two images,If use parameterized based on image,Learn the model only needs to predict the location of the same;And using a model to predict different cartesian parameters of chemical3D坐标,Will no doubt increase the complexity of the model,And the optimization model is more difficult.
In this paper, ring view3D检测transformer(PolarDETR),Using cylindrical coordinates(The radial distance、Horizontal Angle and height)参数化(Is called a parametric;右图)物体位置,And speed parameters into the object of radial velocity and tangential velocity.此外,检测范围、Loss function are defined under polar coordinate.
PolarDETRCan achieve center-Context features polymerization,Enhance the information interaction between the target query and image,In pixels and ray as position encoding,Provide three dimensional space prior,Help predict azimuth.本文的PolarDETRAchieved good performance-速度平衡.
3 PolarDETR
3.1 概述
如下图所示.A different view of the first image input to the SharedCNN提取特征,Target query is then used to detect objects.Each target query coding the semantic characteristics of the corresponding object and location information,And then a series of decoding layer from around a figure in polymerization characteristics,Iteratively update the target query.前馈网络(FFN)基于这些查询,预测类别,And bounding box and speed of polar code.
3.2 A parameterized
Each boundary box parameter is polar code for9元组,Can be estimated according to its boundary box polar parameters
.其中
和
For highly detection range,
For maximum detection range;
是sigmoid函数.The return of the horizontal Angle and heading Angle cosine of positive for
,To ensure the continuity of regression space.
Location estimation of polar decomposition:A parametric object position decoupling of the radial distance and horizontal Angle.距离Associated with the object size,Can learn from image mode;水平角
That is associated with pixel index,Can learn from location coding.
Polar decomposition speed estimation:The radial velocity associated with object size rate of change,The movement of the object in the image plane and tangential velocity associated.
A parameterized explicitly set up image schema and prediction target association,The explicit fragments-in detector can have better convergence and performance.
3.3 解码层
Iteratively decoding layer convergence and update queries.The first to use a long since attention module(MHSA)To query the information interaction between,Then use the linear layer from the query to extract the object position:
转换为3D坐标即可.
中心-Context features polymerization:Polymerization ring view characteristic figure characteristics.先将3DCenter of the projection to the image plane,得到2D中心点:
其中和
Respectively by the first
A camera the projection matrix derived from the inside and outside.Using bilinear interpolation obtained from image characteristics in the center of the characteristics of(如果2DCenter position outside the range image,The feature set to0).
Introduction of context features enhanced query and ring view interaction to promote localization.Based on the center featuresAnd the query embedded
Forecast and center offset,A collection of generated Wen Dian up and down
:
Finally using bilinear interpolation to get up and down Wen Dian characteristics.
像素射线:如下图所示,Pixel rays from optical center through pixel arrive3D点,Directly establish the relationship between the pixels and points,Contains a horizontal Angle of explicit information.
This article USES the pixel ray for the location of the additional code,For each center or Wen Dian up and down,Pixels ray direction vectorAs an additional feature dimension and the original characteristics of joining together.
查询更新:
The updated query embedded coding more accurate location information,So as to make the better in the next decoding layer characteristics of polymerization.
3.4 感知范围、标签分配和损失函数
感知范围:Since the car centered round area.
标签分配:First converts mark labels to polar:,Then use the bidirectional matching method for the real boundary box only forecasts.By matching the price is as follows:
其中是DETRDefined in the category of.
Calculation of each pair of prediction and matching the price after get the price of the real boundary box matrix,Then use the Hungarian algorithm to find the optimal allocation.
损失函数:The bidirectional matching loss by classification loss(focal损失)And polar boundary box/速度损失(L1损失)组成.
3.5 时序信息
将PolarDETR扩展为PolarDETR-TTo accept the input of the sequential images.The object of the current frame centerIs projected to before the images to obtain,以第
帧为例:
其中For the attitude transformation matrix,Response from the car from the first
到
The frame posture change.Similar to the way,From the previous frame sampling center and context characteristics.All sampling characteristics were finally polymerization,Used to update the query embedded.
For the purpose of efficient inference,Figure of image features can be cached in the past,So only need to deal with the current frame image,从而PolarDETR-TThe inference speed close toPolarDETR.
4 实验
4.2 实验设置
Use test tracking algorithm will bePolarDETR扩展为3D目标跟踪,According to the current frame rate,Will object to a frame on the,Then the closest matching method is used to match the target.
4.4 主要结果
PolarDETR-T的性能比PolarDETR要高,Especially for the speed estimation on.
4.5 消融研究
关键组件:A parameterized、Wen Dian and pixel ray up and down all the performance improved,And computational cost can be ignored.
The speed of the polar decomposition:Compared with the cartesian decomposition,Polar decomposition can improve the estimation precision of speed.
Up and down Wen Dian:Performance is stronger with the increase of the number of fluctuation Wen Dian,But after a certain range increase has a negative effect.Used to generate the upper and lower Wen Dian query embedded and center features are helpful to performance improvement are.
解码层:The better the performance of decoding the layer number of the more,But tend to saturation.
边栏推荐
- Mysql implements optimistic locking
- Features and installation of non-relational database MongoDB
- 家用 NAS 服务器(4)| MergerFS和SnapRaid数据定时备份
- Shuttle + Alluxio 加速内存Shuffle起飞
- Brush LeetCode topic series - 10. Regular expression match
- C language: Check for omissions and fill in vacancies (3)
- leetcode每天5题-Day04
- MySQL数据表的基本操作和基于 MySQL数据表的基本操作的综合实例项目
- Differences between i++ and ++i in loops in C language
- goroutine (coroutine) in go language
猜你喜欢
非关系型数据库MongoDB的特点及安装
Google notes cut hidden plug-in installation impression
The original question on the two sides of the automatic test of the byte beating (arranged according to the recording) is real and effective 26
分布式文件存储服务器之Minio对象存储技术参考指南
BGP实验(路由反射器,联邦,路由优化)
goroutine (coroutine) in go language
25K测试老鸟6年经验的面试心得,四种公司、四种问题…
复盘:图像饱和度计算公式和图像信噪(PSNR)比计算公式
classSR论文阅读笔记
驱动页面性能优化的3个有效策略
随机推荐
程序员写PPT的小技巧
5款经典代码阅读器的使用方案对比
Deep learning - CNN realizes the recognition of MNIST handwritten digits
C language: Check for omissions and fill in vacancies (3)
[PSQL] 函数、谓词、CASE表达式、集合运算
Mysql数据库 | 基于Docker搭建Mysql-8.0以上版本主从实例实战
【C语言】LeetCode26.删除有序数组中的重复项&&LeetCode88.合并两个有序数组
Redis database
About the directory structure of the web application
构造方法、成员变量、局部变量
关于鸿蒙系统 JS UI 框架源码的分析
Point Density-Aware Voxels for LiDAR 3D Object Detection 论文笔记
MySql copies data from one table to another table
An advanced method for solving palindromes
Block elements, inline elements (
elements, span elements)淘系资深工程师整理的300+项学习资源清单(2021最新版)
网安学习-内网渗透4
leetcode 665. Non-decreasing Array 非递减数列(中等)
Differences between i++ and ++i in loops in C language
分布式文件存储服务器之Minio对象存储技术参考指南