当前位置:网站首页>Learning notes 25 - multi sensor front fusion technology
Learning notes 25 - multi sensor front fusion technology
2022-07-02 01:17:00 【FUXI_ Willard】
This blog series includes 6 A column , Respectively :《 Overview of autopilot Technology 》、《 Technical foundation of autopilot vehicle platform 》、《 Autopilot positioning technology 》、《 Self driving vehicle environment perception 》、《 Decision and control of autonomous driving vehicle 》、《 Design and application of automatic driving system 》, The author is not an expert in the field of automatic driving , Just a little white on the road of exploring automatic driving , This series has not been read , It is also thinking while reading and summarizing , Welcome to friends , Please give your suggestions in the comments area , Help the author pick out the mistakes , thank you !
This column is about 《 Self driving vehicle environment perception 》 Book notes .
3. Multisensor front fusion technology
Pre fusion technology : At the level of raw data , Directly fuse the data information of all sensors , Then realize the perception function according to the fused data information , Output a detection target of the result layer .
Common fusion methods based on Neural Network , Such as :MV3D(Multi-View 3D Object Detection)、AVOD(Aggregate View Object Detection)、F-PointNet(Frustum PointNets for 3D Object Detection) etc. .
3.1 MV3D
MV3D Point cloud data detected by lidar and captured by visible light camera RGB Image fusion , The input data is an aerial view of the laser radar projection (LIDAR bird view)、 Front view (LIDAR front view) And two dimensions RGB Images , Its network structure mainly includes three-dimensional area generation network (3D proposal network) And region based converged networks (region-based fusion network), Use deep fusion (deep fusion) The way to integrate , Here's the picture :
LIDAR point cloud data is a collection of disordered data points , Before processing point cloud data with the designed neural network model , In order to retain the information of 3D point cloud data more effectively , Easy to handle ,MV3D Project the point cloud data to a specific two-dimensional plane , Get an aerial view and front view .
3D proposal network, Be similar to Faster-RCNN Detect the area generation network applied in the model (Region Proposal Network,RPN), And promote it in three dimensions , One of the functions realized is to generate the three-dimensional candidate box of the target ; This part of the function is completed in the aerial view , There is less occlusion of each target in the aerial view , The efficiency of candidate box extraction is the best .
After extracting the candidate box , Map to three kinds of graphs respectively , Get their respective regions of interest (Region of Interest,ROI), Get into region-based fusion network To merge ; On the choice of fusion mode , Yes : Early integration (early fusion)、 Later Integration (late fusion)、 Deep integration (deep fusion), The comparison of the three methods is shown in the following figure :
Link to the original paper
3.2 AVOD
AVOD It is a kind of fusion of LIDAR point cloud data and RGB 3D object detection algorithm based on image information , Its input is only the aerial view generated by lidar (Bird’s Eye View,BEV)Map And the camera RGB Images , The laser radar forward graph is discarded (Front View,FV) and BEV Density characteristics in (intensity feature), As shown in the figure below :
For input data ,AVOD First, feature extraction , Get two full resolution feature maps , Input to RPN Generate suggestions for areas without orientation , Finally, select the appropriate proposed candidates and send them to the detection network to generate a three-dimensional bounding box with orientation , Complete the target detection task ;AVOD There are two sensor data fusion : Feature fusion and region suggestion fusion .
The illustration above : Above, AVOD Feature extraction network , Using an encoder - decoder (encoder-decoder) structure , Each decoder first samples the input , Then it is connected in series with the output of the corresponding encoder , Finally through a 3×3 The convolution of ; This structure can extract the feature map of the resolution , It effectively avoids that small target objects occupy insufficient pixels in the output feature mapping due to down sampling 1 The problem of , The final output feature mapping contains both the underlying details , It also integrates high-level semantic information , It can effectively improve the detection results of small target objects .
The illustration above : The above figure shows three bounding box coding methods , From left to right :MV3D、 Shaft alignment, 、AVOD 3D bounding box coding method , And MV3D The encoding method of specifying eight vertex coordinates is compared ,AVOD The shape of the three-dimensional boundary box is constrained by a bottom and height , And only one 10 The vector representation of dimension is sufficient ,MV3D need 24 The vector representation of dimensions .
3.3 F-PointNet
F-PointNet Combine the mature target detection methods in two-dimensional images to locate the target , Get the visual cone in the corresponding 3D point cloud data (frustum), And perform bounding box regression to complete the detection task , As shown in the figure below :
F-PointNet The whole network structure consists of three parts : Cone of vision (frustum proposal)、 3D instance segmentation (3D instance segmentation)、 3D bounding box regression (amodal 3D box estimation); The network structure is shown in the figure below :
F-PointNet utilize RGB The advantage of high image resolution , The adoption is based on FPN The detection model first obtains the boundary box of the target on the two-dimensional image , Then according to the known camera projection matrix , Lift the two-dimensional bounding box to the visual cone that defines the three-dimensional search space of the target , And collect all points in the truncated body to form a cone point cloud ;
The illustration above : chart (a) Is the camera coordinate system , chart (b) Is the cone coordinate system , chart ( c ) It is a three-dimensional mask local coordinate system , chart (d) yes T-Net Predicted 3D Target coordinate system ; To avoid occlusion and blurring , For cone point cloud data ,F-PointNet Use PointNet( or PointNet++) Model for instance segmentation ; In 3D space , Objects are mostly separated , 3D segmentation is more reliable ; Split by instance , You can get the three-dimensional mask of the target object ( That is, all point clouds belonging to the target ), And calculate its centroid as the new coordinate origin , Pictured ( c ) Shown , Convert to local coordinate system , To improve the translation invariance of the algorithm ; Last , For target point cloud data ,F-PointNet By using with T-Net Of PointNet( or PointNet++) Regression operation of the model , Predict the center of the target 3D bounding box 、 Size and orientation , Pictured (d) Shown , Finally complete the detection task ;T-Net The function of is to predict the distance from the real center of the three-dimensional boundary box of the target to the centroid of the target , Then take the prediction center as the origin , Get the target coordinate system .
Summary :
F-PointNet In order to ensure the invariance of the point cloud data in each step and finally more accurately return to the 3D boundary box , A total of three coordinate system transformations are required , They are visual cone conversion 、 Mask centroid conversion 、T-Net forecast .
Link to the original paper
边栏推荐
- Data visualization in medical and healthcare applications
- ACM tutorial - quick sort (regular + tail recursion + random benchmark)
- How to extract login cookies when JMeter performs interface testing
- How to reflect and solve the problem of bird flight? Why are planes afraid of birds?
- Sql--- related transactions
- 970 golang realizes the communication between multithreaded server and client
- XMIND mind map
- Global and Chinese markets of digital crosspoint switches and mux/demux 2022-2028: Research Report on technology, participants, trends, market size and share
- The author is more willing to regard industrial Internet as a concept much richer than consumer Internet
- cookie、session、tooken
猜你喜欢

测试员8年工资变动,令网友羡慕不已:你一个月顶我一年工资

Appium inspector can directly locate the WebView page. Does anyone know the principle

Recommend an online interface mock tool usemock

AIX存储管理之逻辑卷的创建及属性的查看和修改

gradle
![[IVX junior engineer training course 10 papers to get certificates] 01 learn about IVX and complete the New Year greeting card](/img/99/53b0ae47bada8b0d4db30d0517fe3d.jpg)
[IVX junior engineer training course 10 papers to get certificates] 01 learn about IVX and complete the New Year greeting card

AIX存储管理之卷组的创建(一)

Picture puzzle wechat applet source code_ Support multi template production and traffic master

Single chip microcomputer -- hlk-w801 transplant NES simulator (III)

ACM tutorial - quick sort (regular + tail recursion + random benchmark)
随机推荐
Liteos learning - first knowledge of development environment
8.8.4-PointersOnC-20220215
[IVX junior engineer training course 10 papers] 04 canvas and a group photo of IVX and me
笔者更加愿意将产业互联网看成是一个比消费互联网要丰富得多的概念
Edge extraction edges based on Halcon learning_ image. Hdev routine
测试员8年工资变动,令网友羡慕不已:你一个月顶我一年工资
[Chongqing Guangdong education] Tianshui Normal University universe exploration reference
Global and Chinese markets of edge AI software 2022-2028: Research Report on technology, participants, trends, market size and share
AIX存储管理之总结篇
Otaku wallpaper Daquan wechat applet source code - with dynamic wallpaper to support a variety of traffic owners
Study note 2 -- definition and value of high-precision map
How to determine whether the current script is in the node environment or the browser environment?
学习笔记24--多传感器后融合技术
Global and Chinese markets for context and location-based services 2022-2028: Research Report on technology, participants, trends, market size and share
Shell Function
[eight sorts ①] insert sort (direct insert sort, Hill sort)
站在新的角度来看待产业互联网,并且去寻求产业互联网的正确方式和方法
Iclr2022 | spherenet and g-spherenet: autoregressive flow model for 3D molecular graph representation and molecular geometry generation
970 golang realizes the communication between multithreaded server and client
Global and Chinese market of collaborative applications 2022-2028: Research Report on technology, participants, trends, market size and share