当前位置:网站首页>[yolo3d]: real time detection of end-to-end 3D point cloud input
[yolo3d]: real time detection of end-to-end 3D point cloud input
2022-07-02 03:27:00 【Fireworks at dawn in the city】
YOLO3D: End to end 3D Real time detection of point cloud input
Preface
YOLO3D take YOLO be applied to 3D Target detection of point cloud , And Complex-YOLO(Complex-YOLO From here comes the interpretation of ) similar , The difference is that will yolo v2 The loss function of is extended to include yaw angle 、 Three dimensional in Cartesian coordinates box And direct regression box Height .
The paper : https://arxiv.org/abs/1808.02350
Algorithm analysis
Model input
This paper 3D The point cloud is projected as an aerial view grid , Create two grid mappings as shown in the figure .
The first contains the maximum height , Each grid cell ( Pixels ) The value represents the height of the highest point associated with the cell . The second grid graph shows the density of points , Density calculation reference MV3D( From then on, the interpretation of the paper enters ).
Network structure
The structural reference of the paper YOLO-v2 framework , Some changes have been made .
- Modified a maximum pooling layer , Take the down sample from 32 Change it to 16, With a larger grid , This helps detect small objects such as pedestrians and cyclists .
- Deleted from the model skip connection, Because it will lead to inaccurate results .
Return to loss
3D box Return to
The paper is in the original YOLO v2 Two regression terms are added to generate 3D Bounding box : Central z Coordinates and frame height .z The regression of coordinates is similar to x and y The way of return , adopt sigmoid Activate the function to coordinate .
It is worth noting that , although x and y By predicting 0 To 1 Regression between the values , Locate where the point is located in the unit , but z The value of is mapped only to be located in a vertical grid cell , As shown in the figure below . Choose to z Values are mapped to only one grid and x and y The reason for mapping to multiple grid cells is z The variability of the median value of the dimension is much smaller than x and y The variability of ( Most objects have very similar frame elevations ).
Yaw angle regression
The direction range of the bounding box defined in the paper is from -π To π. Normalize the range to -1 To 1, And adjust our model to directly predict the direction of the bounding box through a single regression number . In the loss function , Calculate the mean square error between the actual ground situation and the angle we predicted :
Bounding box loss function
3D box The loss is 2Dbox original YOLO Expansion of losses . The loss of yaw item is in accordance with The above calculation . The loss of height is the extension of the loss of width and length . Similarly ,z The loss of coordinates is x and y Expansion of coordinate loss .
λ c o o r λcoor λcoor : Weight assigned to coordinate loss ,
λ c o n f λconf λconf : The weight assigned to the prediction confidence loss ,
λ y a w λyaw λyaw: Weight assigned to bearing loss ,
λ c l a s s e s λclasses λclasses : The weight assigned to the loss Class probability ,
L i j o b j L^{obj}_{ ij} Lijobj : A variable , It's based on i And the first j Check whether there is a real value in the positions 0 and 1 Value . If there is a box , Then for 1, Otherwise 0,
L i j n o o b j L^{noobj}_{ ij} Lijnoobj : Contrary to the previous variable . If there are no objects , The values for 0, Otherwise, the value is 1,
x i , y i , z i x_i , y_i , z_i xi,yi,zi: Ground live coordinates ,
x i ^ , y i ^ , z i ^ \hat{x_i}, \hat{y_i}, \hat{z_i} xi^,yi^,zi^ : Ground truth and predicted bearing ,
φ i , φ i ^ φ_i, \hat{φ_i} φi,φi^ : Ground truth and predicted bearing … etc. ,
C i , C i ^ C_i, \hat{C_i} Ci,Ci^ : Truth and prediction confidence ,
w i , l i , h i w_i , l_i , h_i wi,li,hi : True case width 、 Height and length of box ,
w i , l i , h i w^i, l^i, h^i wi,li,hi : Predicted width 、 Height and length boxes
p i ( c ) 、 p i ^ ( c ) p_i( c)、\hat{p_i}( c) pi(c)、pi^(c) Actual situation and predicted category probability .
Dataset processing
The paper uses KITTI Benchmark data set . Point cloud in per pixel 0.1m The resolution is 2D The projection in space is an aerial view of the grid , And MV3D Use the same resolution .
Grid diagram shows LiDAR The space range is right 30.4 rice , towards the left 30.4 rice , forward 60.8 rice . The above resolution is 0.1 Using this range will cause the input shape of each channel to be 608x608.
LiDAR The height in space is clipped in +2m and -2m Between , And shrink it to 0 To 255 Expressed as the pixel value in the maximum height channel .
Training
The network is trained in an end-to-end manner . The momentum used is 0.9、 The weight decays to 0.0005 Random gradient descent of . Train the network 150 individual epoch, Batch size is 4.
For the first few epoch, Change the learning rate from 0.00001 Slowly increase to 0.0001. If you start with a high learning rate , Our model usually diverges due to gradient instability . Continue to use 0.0001 Training 90 Time , And then use 0.0005 Training 30 Period , Last use 0.00005 At the end of the training 20 Time .
result
Reference resources :
Paper reading 《YOLO3D: End-to-end real-time 3D Oriented Object Bounding Box Detection from LiDAR Point Cloud》
边栏推荐
- C#聯合halcon脫離halcon環境以及各種報錯解决經曆
- Go执行shell命令
- Verilog 过程连续赋值
- spark调优
- Failed to upgrade schema, error: “file does not exist
- Unity脚本的基础语法(8)-协同程序与销毁方法
- Go execute shell command
- QT environment generates dump to solve abnormal crash
- Sentry experience and architecture, a fledgling monitoring product with a market value of $100million
- Intersection vengraph
猜你喜欢
数据传输中的成帧
Yan Rong looks at how to formulate a multi cloud strategy in the era of hybrid cloud
C shallow copy and deep copy
《MATLAB 神经网络43个案例分析》:第41章 定制神经网络的实现——神经网络的个性化建模与仿真
ZABBIX API creates hosts in batches according to the host information in Excel files
JS <2>
[HCIA continuous update] overview of dynamic routing protocol
Pycharm2021 delete the package warehouse list you added
[HCIA continuous update] working principle of OSPF Protocol
Kubernetes cluster storageclass persistent storage resource core concept and use
随机推荐
Global and Chinese market of bone adhesives 2022-2028: Research Report on technology, participants, trends, market size and share
Verilog avoid latch
Just a few simple steps - start playing wechat applet
Verilog reg register, vector, integer, real, time register
《MATLAB 神经网络43个案例分析》:第41章 定制神经网络的实现——神经网络的个性化建模与仿真
GSE104154_scRNA-seq_fibrotic MC_bleomycin/normalized AM3
Kotlin基础学习 16
Kotlin基础学习 17
Mathematical calculation in real mode addressing
[database]jdbc
Kotlin基础学习 15
Pycharm2021 delete the package warehouse list you added
[C Advanced] brother Peng takes you to play with strings and memory functions
Go execute shell command
焱融看 | 混合雲時代下,如何制定多雲策略
Large screen visualization from bronze to the advanced king, you only need a "component reuse"!
Intersection vengraph
《MATLAB 神經網絡43個案例分析》:第42章 並行運算與神經網絡——基於CPU/GPU的並行神經網絡運算
PY3 link MySQL
Load different fonts in QML