当前位置:网站首页>Detailed reading of stereo r-cnn paper -- Experiment: detailed explanation and result analysis

Detailed reading of stereo r-cnn paper -- Experiment: detailed explanation and result analysis

2022-07-06 10:57:00 Is it Wei Xiaobai

In the past, I used to read the method part when reading papers , Then look at the performance of the test data . Recently, when I was writing my thesis, I found ,“ How to design the experiment ” It's also important , I will pay more attention to this part when I read the thesis in the future .

One 、 Details of the experiment

Introduce the conditions required for the test in detail

Network

Use five ranges (scale){32, 64, 128, 126, 512} And three proportions (ratios){0.5, 1, 2} Of archor. Adjust the size of the shorter edge of the original image to 600 Pixels . about Stereo-RPN, Due to the connection of left and right characteristic graphs , You need to have 1024 Input channels , instead of 512 Layers layer. Again , stay R-CNN Back to the head head Yes 512 Input channels . stay Titan XP GPU On ,Stereo R-CNN To a Stereo pair The reasoning time is about 0.28s.

Training

It's mainly about loss Explanation

\begin{aligned} L &=w_{c l s}^{p} L_{c l s}^{p}+w_{r e g}^{p} L_{r e g}^{p}+w_{c l s}^{r} L_{c l s}^{r}+w_{b o x}^{r} L_{b o x}^{r} +w_{\alpha}^{r} L_{\alpha}^{r}+w_{d i m}^{r} L_{d i m}^{r}++w_{k e y}^{r} L_{k e y}^{r} \end{aligned}

Among them (\cdot)^{p},(\cdot)^{r} Express RPN and R-CNN, Subscript box、α、dim、key respectively stereo boxes Of loss,viewpoint Of loss、dimension Of loss and keypotint Of loss.

During training, the left and right images will also be flipped and exchanged ( Correspondingly, it will viewpoint angle and keypoint Mirror image ) To expand the data set . One per training batch Keep one in stereo and 512 individual RoIs.

Other conditions : Use SGD、 The weight decays to 0.0005、 Momentum is 0.9%、 The learning rate is initialized to 0.001 And each 5 individual epoch Reduce 0.1%. Total training 20 individual epoch.

Two 、 Result analysis

Stereo Recall and Stereo Detection

Stereo R-CNN The target of is to detect and correlate the targets in the left and right images at the same time . In addition to evaluating the left and right images 2D Average recall (AR) and 2D average precision (AP) Outside , Also defined stereo AR and stereo AP Measure , Only query stereo box Only when the following conditions are met can it be considered as true positive (TPS):

1. left GT The maximum size of the box IOU Greater than the given threshold ;
2. On the right side GT The maximum size of the box IOU Greater than the given threshold ;
3. Select the left and right GT The box belongs to the same object .

As shown in the table 1 Shown , And Faster RCNN comparison Stereo RCNN Have similar on a single image proposal recall and detection precision, At the same time, high-quality data association is generated in the left and right images without additional calculation .

although RPN Medium stereo AR Slightly smaller than left AR, But in R-CNN Left observed after 、 Right and right stereo AP Almost the same , This shows that the detection performance on the left and right images is consistent , And almost all the left images are true positive box There is a corresponding true positive box. 

In addition, two left and right feature fusion strategies are tested : Element based Averaging Strategy and channel cascading strategy . As shown in the table 1 Described in , Because all the information is retained , Channel cascading shows better performance .

above , Proved accurate stereo detection and association Provide enough box-level constraint .

3D Detection and 3D Localization

Use Precision for bird’s eye view (APbv) and 3D box (AP3d) evaluation 3D Detection and positioning accuracy . It turns out that table2 in . The detailed comparative analysis will not be repeated , You can read the paper directly .

It is worth noting that ,Kitti 3D The detection reference is for image-based (image-based) The method is difficult , For this method ,3D Performance tends to decline as the distance from the target object increases . This phenomenon is shown in Figure 7 Can be observed intuitively , Although the method in this paper realizes subpixel disparity estimation ( Less than 0.5 Pixels ), But because parallax is inversely proportional to depth , The depth error increases with the increase of object distance . For targets with obvious parallax , Based on strict geometric constraints, this paper realizes high-precision depth estimation . That explains why IoU The higher the threshold , The easier it is for the target object to belong to , Compared with other methods , This article gets more improvements .

 

原网站

版权声明
本文为[Is it Wei Xiaobai]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202131651567313.html