当前位置:网站首页>[yolo3d]: real time detection of end-to-end 3D point cloud input
[yolo3d]: real time detection of end-to-end 3D point cloud input
2022-07-02 03:27:00 【Fireworks at dawn in the city】
YOLO3D: End to end 3D Real time detection of point cloud input
Preface
YOLO3D take YOLO be applied to 3D Target detection of point cloud , And Complex-YOLO(Complex-YOLO From here comes the interpretation of ) similar , The difference is that will yolo v2 The loss function of is extended to include yaw angle 、 Three dimensional in Cartesian coordinates box And direct regression box Height .
The paper : https://arxiv.org/abs/1808.02350
Algorithm analysis
Model input
This paper 3D The point cloud is projected as an aerial view grid , Create two grid mappings as shown in the figure .
The first contains the maximum height , Each grid cell ( Pixels ) The value represents the height of the highest point associated with the cell . The second grid graph shows the density of points , Density calculation reference MV3D( From then on, the interpretation of the paper enters ).
Network structure
The structural reference of the paper YOLO-v2 framework , Some changes have been made .
- Modified a maximum pooling layer , Take the down sample from 32 Change it to 16, With a larger grid , This helps detect small objects such as pedestrians and cyclists .
- Deleted from the model skip connection, Because it will lead to inaccurate results .
Return to loss
3D box Return to
The paper is in the original YOLO v2 Two regression terms are added to generate 3D Bounding box : Central z Coordinates and frame height .z The regression of coordinates is similar to x and y The way of return , adopt sigmoid Activate the function to coordinate .
It is worth noting that , although x and y By predicting 0 To 1 Regression between the values , Locate where the point is located in the unit , but z The value of is mapped only to be located in a vertical grid cell , As shown in the figure below . Choose to z Values are mapped to only one grid and x and y The reason for mapping to multiple grid cells is z The variability of the median value of the dimension is much smaller than x and y The variability of ( Most objects have very similar frame elevations ).
Yaw angle regression
The direction range of the bounding box defined in the paper is from -π To π. Normalize the range to -1 To 1, And adjust our model to directly predict the direction of the bounding box through a single regression number . In the loss function , Calculate the mean square error between the actual ground situation and the angle we predicted :
Bounding box loss function
3D box The loss is 2Dbox original YOLO Expansion of losses . The loss of yaw item is in accordance with The above calculation . The loss of height is the extension of the loss of width and length . Similarly ,z The loss of coordinates is x and y Expansion of coordinate loss .
λ c o o r λcoor λcoor : Weight assigned to coordinate loss ,
λ c o n f λconf λconf : The weight assigned to the prediction confidence loss ,
λ y a w λyaw λyaw: Weight assigned to bearing loss ,
λ c l a s s e s λclasses λclasses : The weight assigned to the loss Class probability ,
L i j o b j L^{obj}_{ ij} Lijobj : A variable , It's based on i And the first j Check whether there is a real value in the positions 0 and 1 Value . If there is a box , Then for 1, Otherwise 0,
L i j n o o b j L^{noobj}_{ ij} Lijnoobj : Contrary to the previous variable . If there are no objects , The values for 0, Otherwise, the value is 1,
x i , y i , z i x_i , y_i , z_i xi,yi,zi: Ground live coordinates ,
x i ^ , y i ^ , z i ^ \hat{x_i}, \hat{y_i}, \hat{z_i} xi^,yi^,zi^ : Ground truth and predicted bearing ,
φ i , φ i ^ φ_i, \hat{φ_i} φi,φi^ : Ground truth and predicted bearing … etc. ,
C i , C i ^ C_i, \hat{C_i} Ci,Ci^ : Truth and prediction confidence ,
w i , l i , h i w_i , l_i , h_i wi,li,hi : True case width 、 Height and length of box ,
w i , l i , h i w^i, l^i, h^i wi,li,hi : Predicted width 、 Height and length boxes
p i ( c ) 、 p i ^ ( c ) p_i( c)、\hat{p_i}( c) pi(c)、pi^(c) Actual situation and predicted category probability .
Dataset processing
The paper uses KITTI Benchmark data set . Point cloud in per pixel 0.1m The resolution is 2D The projection in space is an aerial view of the grid , And MV3D Use the same resolution .
Grid diagram shows LiDAR The space range is right 30.4 rice , towards the left 30.4 rice , forward 60.8 rice . The above resolution is 0.1 Using this range will cause the input shape of each channel to be 608x608.
LiDAR The height in space is clipped in +2m and -2m Between , And shrink it to 0 To 255 Expressed as the pixel value in the maximum height channel .
Training
The network is trained in an end-to-end manner . The momentum used is 0.9、 The weight decays to 0.0005 Random gradient descent of . Train the network 150 individual epoch, Batch size is 4.
For the first few epoch, Change the learning rate from 0.00001 Slowly increase to 0.0001. If you start with a high learning rate , Our model usually diverges due to gradient instability . Continue to use 0.0001 Training 90 Time , And then use 0.0005 Training 30 Period , Last use 0.00005 At the end of the training 20 Time .
result

Reference resources :
Paper reading 《YOLO3D: End-to-end real-time 3D Oriented Object Bounding Box Detection from LiDAR Point Cloud》
边栏推荐
- verilog 并行块实现
- 焱融看 | 混合雲時代下,如何制定多雲策略
- Detailed explanation of ThreadLocal
- Continuous assignment of Verilog procedure
- 终日乾乾,夕惕若厉
- FFMpeg AVFrame 的概念.
- js生成随机数
- [golang] leetcode intermediate bracket generation & Full Permutation
- C # joint halcon out of halcon Environment and various Error Reporting and Resolution Experiences
- Global and Chinese markets for electronic laryngoscope systems 2022-2028: Research Report on technology, participants, trends, market size and share
猜你喜欢

Generate random numbers that obey normal distribution

Download and use of the super perfect screenshot tool snipaste

MSI announced that its motherboard products will cancel all paper accessories

The capacity is upgraded again, and the new 256gb large capacity specification of Lexar rexa 2000x memory card is added

Verilog avoid latch

C shallow copy and deep copy

ZABBIX API creates hosts in batches according to the host information in Excel files

Framing in data transmission

JS <2>
![[HCIA continuous update] working principle of OSPF Protocol](/img/bc/4eeb091c511fd563fb1e00c8c8881a.jpg)
[HCIA continuous update] working principle of OSPF Protocol
随机推荐
[HCIA continuous update] working principle of OSPF Protocol
Just a few simple steps - start playing wechat applet
Qualcomm platform wifi-- WPA_ supplicant issue
Kubernetes cluster storageclass persistent storage resource core concept and use
Download and use of the super perfect screenshot tool snipaste
Gradle notes
Docker installs canal and MySQL for simple testing and implementation of redis and MySQL cache consistency
只需简单几步 - 开始玩耍微信小程序
Kotlin basic learning 15
Verilog 避免 Latch
PHP array processing
ZABBIX API creates hosts in batches according to the host information in Excel files
Qt的网络连接方式
ORA-01547、ORA-01194、ORA-01110
MSI announced that its motherboard products will cancel all paper accessories
In the era of programmers' introspection, five-year-old programmers are afraid to go out for interviews
Global and Chinese market of X-ray detectors 2022-2028: Research Report on technology, participants, trends, market size and share
《MATLAB 神经网络43个案例分析》:第41章 定制神经网络的实现——神经网络的个性化建模与仿真
verilog 并行块实现
Kotlin basic learning 16