当前位置:网站首页>Learning notes 25 - multi sensor front fusion technology
Learning notes 25 - multi sensor front fusion technology
2022-07-02 01:17:00 【FUXI_ Willard】
This blog series includes 6 A column , Respectively :《 Overview of autopilot Technology 》、《 Technical foundation of autopilot vehicle platform 》、《 Autopilot positioning technology 》、《 Self driving vehicle environment perception 》、《 Decision and control of autonomous driving vehicle 》、《 Design and application of automatic driving system 》, The author is not an expert in the field of automatic driving , Just a little white on the road of exploring automatic driving , This series has not been read , It is also thinking while reading and summarizing , Welcome to friends , Please give your suggestions in the comments area , Help the author pick out the mistakes , thank you !
This column is about 《 Self driving vehicle environment perception 》 Book notes .
3. Multisensor front fusion technology
Pre fusion technology : At the level of raw data , Directly fuse the data information of all sensors , Then realize the perception function according to the fused data information , Output a detection target of the result layer .
Common fusion methods based on Neural Network , Such as :MV3D(Multi-View 3D Object Detection)、AVOD(Aggregate View Object Detection)、F-PointNet(Frustum PointNets for 3D Object Detection) etc. .
3.1 MV3D
MV3D Point cloud data detected by lidar and captured by visible light camera RGB Image fusion , The input data is an aerial view of the laser radar projection (LIDAR bird view)、 Front view (LIDAR front view) And two dimensions RGB Images , Its network structure mainly includes three-dimensional area generation network (3D proposal network) And region based converged networks (region-based fusion network), Use deep fusion (deep fusion) The way to integrate , Here's the picture :
LIDAR point cloud data is a collection of disordered data points , Before processing point cloud data with the designed neural network model , In order to retain the information of 3D point cloud data more effectively , Easy to handle ,MV3D Project the point cloud data to a specific two-dimensional plane , Get an aerial view and front view .
3D proposal network, Be similar to Faster-RCNN Detect the area generation network applied in the model (Region Proposal Network,RPN), And promote it in three dimensions , One of the functions realized is to generate the three-dimensional candidate box of the target ; This part of the function is completed in the aerial view , There is less occlusion of each target in the aerial view , The efficiency of candidate box extraction is the best .
After extracting the candidate box , Map to three kinds of graphs respectively , Get their respective regions of interest (Region of Interest,ROI), Get into region-based fusion network To merge ; On the choice of fusion mode , Yes : Early integration (early fusion)、 Later Integration (late fusion)、 Deep integration (deep fusion), The comparison of the three methods is shown in the following figure :
Link to the original paper
3.2 AVOD
AVOD It is a kind of fusion of LIDAR point cloud data and RGB 3D object detection algorithm based on image information , Its input is only the aerial view generated by lidar (Bird’s Eye View,BEV)Map And the camera RGB Images , The laser radar forward graph is discarded (Front View,FV) and BEV Density characteristics in (intensity feature), As shown in the figure below :
For input data ,AVOD First, feature extraction , Get two full resolution feature maps , Input to RPN Generate suggestions for areas without orientation , Finally, select the appropriate proposed candidates and send them to the detection network to generate a three-dimensional bounding box with orientation , Complete the target detection task ;AVOD There are two sensor data fusion : Feature fusion and region suggestion fusion .
The illustration above : Above, AVOD Feature extraction network , Using an encoder - decoder (encoder-decoder) structure , Each decoder first samples the input , Then it is connected in series with the output of the corresponding encoder , Finally through a 3×3 The convolution of ; This structure can extract the feature map of the resolution , It effectively avoids that small target objects occupy insufficient pixels in the output feature mapping due to down sampling 1 The problem of , The final output feature mapping contains both the underlying details , It also integrates high-level semantic information , It can effectively improve the detection results of small target objects .
The illustration above : The above figure shows three bounding box coding methods , From left to right :MV3D、 Shaft alignment, 、AVOD 3D bounding box coding method , And MV3D The encoding method of specifying eight vertex coordinates is compared ,AVOD The shape of the three-dimensional boundary box is constrained by a bottom and height , And only one 10 The vector representation of dimension is sufficient ,MV3D need 24 The vector representation of dimensions .
3.3 F-PointNet
F-PointNet Combine the mature target detection methods in two-dimensional images to locate the target , Get the visual cone in the corresponding 3D point cloud data (frustum), And perform bounding box regression to complete the detection task , As shown in the figure below :
F-PointNet The whole network structure consists of three parts : Cone of vision (frustum proposal)、 3D instance segmentation (3D instance segmentation)、 3D bounding box regression (amodal 3D box estimation); The network structure is shown in the figure below :
F-PointNet utilize RGB The advantage of high image resolution , The adoption is based on FPN The detection model first obtains the boundary box of the target on the two-dimensional image , Then according to the known camera projection matrix , Lift the two-dimensional bounding box to the visual cone that defines the three-dimensional search space of the target , And collect all points in the truncated body to form a cone point cloud ;
The illustration above : chart (a) Is the camera coordinate system , chart (b) Is the cone coordinate system , chart ( c ) It is a three-dimensional mask local coordinate system , chart (d) yes T-Net Predicted 3D Target coordinate system ; To avoid occlusion and blurring , For cone point cloud data ,F-PointNet Use PointNet( or PointNet++) Model for instance segmentation ; In 3D space , Objects are mostly separated , 3D segmentation is more reliable ; Split by instance , You can get the three-dimensional mask of the target object ( That is, all point clouds belonging to the target ), And calculate its centroid as the new coordinate origin , Pictured ( c ) Shown , Convert to local coordinate system , To improve the translation invariance of the algorithm ; Last , For target point cloud data ,F-PointNet By using with T-Net Of PointNet( or PointNet++) Regression operation of the model , Predict the center of the target 3D bounding box 、 Size and orientation , Pictured (d) Shown , Finally complete the detection task ;T-Net The function of is to predict the distance from the real center of the three-dimensional boundary box of the target to the centroid of the target , Then take the prediction center as the origin , Get the target coordinate system .
Summary :
F-PointNet In order to ensure the invariance of the point cloud data in each step and finally more accurately return to the 3D boundary box , A total of three coordinate system transformations are required , They are visual cone conversion 、 Mask centroid conversion 、T-Net forecast .
Link to the original paper
边栏推荐
- Keepalived introduction and installation
- MySQL winter vacation self-study 2022 12 (4)
- Global and Chinese market of wireless charging magnetic discs 2022-2028: Research Report on technology, participants, trends, market size and share
- Global and Chinese markets for maritime services 2022-2028: Research Report on technology, participants, trends, market size and share
- A problem about function template specialization
- [disease detection] realize lung cancer detection system based on BP neural network, including GUI interface
- 学习笔记2--高精度地图定义及价值
- Finally got byte offer, 25-year-old inexperienced experience in software testing, to share with you
- [IVX junior engineer training course 10 papers to get certificates] 01 learn about IVX and complete the New Year greeting card
- Infiltration records of CFS shooting range in the fourth phase of the western regions' Dadu Mansion
猜你喜欢

【八大排序②】选择排序(选择排序,堆排序)

【图像增强】基于Frangi滤波器实现血管图像增强附matlab代码
![[IVX junior engineer training course 10 papers to get certificates] 01 learn about IVX and complete the New Year greeting card](/img/99/53b0ae47bada8b0d4db30d0517fe3d.jpg)
[IVX junior engineer training course 10 papers to get certificates] 01 learn about IVX and complete the New Year greeting card

About asp Net core uses a small detail of datetime date type parameter

Basis of deep learning neural network
![[eight sorts ①] insert sort (direct insert sort, Hill sort)](/img/8d/2c45a8fb582dabedcd2658cd7c54bc.png)
[eight sorts ①] insert sort (direct insert sort, Hill sort)

【八大排序④】归并排序、不基于比较的排序(计数排序、基数排序、桶排序)

首场“移动云杯”空宣会,期待与开发者一起共创算网新世界!

Data visualization in medical and healthcare applications

Entrepreneurship is a little risky. Read the data and do a business analysis
随机推荐
Zak's latest "neural information transmission", with slides and videos
【图像增强】基于Frangi滤波器实现血管图像增强附matlab代码
一名优秀的软件测试人员,需要掌握哪些技能?
Global and Chinese markets for food allergens and intolerance tests 2022-2028: Research Report on technology, participants, trends, market size and share
[eight sorts ④] merge sort, sort not based on comparison (count sort, cardinal sort, bucket sort)
[image enhancement] vascular image enhancement based on frangi filter with matlab code
【八大排序①】插入排序(直接插入排序、希尔排序)
[IVX junior engineer training course 10 papers] 06 database and services
MySQL winter vacation self-study 2022 12 (4)
Recently, three articles in the nature sub Journal of protein and its omics knowledge map have solved the core problems of biology
[WesternCTF2018]shrine writeup
Principle of finding combinatorial number and template code
ACM教程 - 快速排序(常规 + 尾递归 + 随机基准数)
Cookie, session, tooken
Edge extraction edges based on Halcon learning_ image. Hdev routine
XMIND mind map
The pain of Xiao Sha
Data visualization in medical and healthcare applications
学习笔记25--多传感器前融合技术
Excel PivotTable