当前位置：网站首页>Target detection: speed and accuracy comparison (fater r-cnn, r-fcn, SSD, FPN, retinanet and yolov3)

Target detection: speed and accuracy comparison (fater r-cnn, r-fcn, SSD, FPN, retinanet and yolov3)

2022-07-28 14:12:00 【51CTO】

Click on the above “AI Algorithm and image processing ”

Heavy dry goods , First time delivery

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ feature extraction

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ object detection _02

Recommended articles 【 Click below to jump directly 】：

Source of the article ：https://zhuanlan.zhihu.com/p/91719437

Authorized to reprint , For reprint, please contact the author

Feature extractors (VGG16, ResNet, Inception, MobileNet).
Output strides for the extractor.
Input image resolutions.
Matching strategy and IoU threshold (how predictions are excluded in calculating loss).
Non-max suppression IoU threshold.
Hard example mining ratio (positive v.s. negative anchor ratio).
The number of proposals or predictions.
Boundary box encoding.
Data augmentation.
Training dataset.
Use of multi-scale images in training or testing (with cropping).
Which feature map layer(s) for object detection.
Localization loss function.
Deep learning software platform used.
Training configurations including batch size, input image resize, learning rate, and learning rate decay.

The worst part is , Technology is developing so fast , So that any comparison quickly becomes obsolete . ad locum , We summarize the results of each paper , So you can analyze and compare them completely . then , We according to the Google Research It concludes with a summary . By presenting multiple points of view in one situation , We hope we can better understand the performance indicators .

Performance results

In this section , We summarize the performance of the corresponding paper report . Feel free to quickly browse this section .

Faster R-CNN（https://arxiv.org/pdf/1506.01497.pdf）

This is a PASCAL VOC 2012 Test set results . We are right to represent Faster R-CNN The last performance 3 Interested in . The second column represents RPN Formulated by the network RoI Number . The third column represents the training data set used . The fourth column is the average accuracy of the measurement accuracy （mAP）.

mAP：https://medium.com/@jonathan_hui/map-mean-average-precision-for-object-detection-45c121a31173

PASCAL VOC 2012 Test set results

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ object detection _03

VOC 2012 for Faster R-CNN

MS COCO The result on

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ feature extraction _04

COCO for Faster R-CNN

Use PASCAL VOC 2007 Test set in K40 GPU Time up , In Milliseconds .

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ Data sets _05

R-FCN (https://arxiv.org/pdf/1605.06409.pdf)

PASCAL VOC 2012 Test set results

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ object detection _06

VOC 2012 for R-FCN

（ Multi scale training and testing were used for some results .）

MS COCO The result on

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ Data sets _07

COCO for R-FCN

SSD (https://arxiv.org/pdf/1512.02325.pdf)

This is the use of 300×300 and 512×512 Input the PASCAL VOC 2007、2012 and MS COCO Result .

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ feature extraction _08

SSD

（SSD300 * and SSD512 * Apply data enhancements to small objects to improve mAP.）

performance ：

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ feature extraction _09

Speed is measure with a batch size of 1 or 8 during inference

（ Here YOLO It means more than YOLOv2 or YOLOv3 Slow v1）

MS COCO Result ：

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ feature extraction _10

COCO for SSD

YOLO（https://arxiv.org/pdf/1612.08242.pdf）

PASCAL VOC 2007 Test set results .

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ object detection _11

VOC 2007 for YOLOv2

（ We added... Here VOC 2007 test , Because it has results for different image resolutions .）

PASCAL VOC 2012 Test set results .

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ Data sets _12

VOC 2012 for YOLOv2

MS COCO The result on .

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ object detection _13

COCO for YOLOv2

YOLOv3 (https://pjreddie.com/media/files/papers/YOLOv3.pdf)

MS COCO The result on

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ object detection _14

COCO for YOLOv3

YOLOv3 Performance of

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ object detection _15

YOCO2 stay COCO Performance on

FPN (https://arxiv.org/pdf/1612.03144.pdf)

MS COCO The result on .

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ feature extraction _16

COCO for FPN

RetinaNet (https://arxiv.org/pdf/1708.02002.pdf)

MS COCO The result on

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ object detection _13

COCO for RetinaNet

MS COCO Test the speed of development （ms） And accuracy （AP）.

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ object detection _18

COCO for RetinaNet

Compare the paper results

The result of comparing different papers side by side is Unwise . These experiments were done under different settings . For all that , We decided to draw them together , So that you can at least have a general understanding of their general location . But notice , We should never compare these figures directly .

For the results presented below , Use PASCAL VOC 2007 and 2012 The data train the model .mAP It's using PASCAL VOC 2012 Measured by the tester . about SSD, The chart shows 300×300 and 512×512 Enter the result of the image . about YOLO, The result is 288×288、416×461 and 544×544 Images . High resolution images of the same model have better mAP, But the processing speed is slow .

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ object detection _19

* Indicates that small target data enhancement is applied .

** Indicates that the result is based on VOC 2007 Measured by the test set . The reason for including these contents , Because YOLO There are not many papers VOC 2012 test result . because VOC 2007 The result is generally better than 2012 Better year , So we added R-FCN VOC 2007 As a cross reference （ cross reference）.

Input image resolution and feature extractor will affect the speed . The following are the highest and lowest reports of the corresponding papers FPS. however , The following results may have great deviation , Especially in different mAP Measure under .

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ object detection _20

COCO The result on the dataset

In recent years , Many results are using COCO The target detection data set is specially measured .COCO Data sets are difficult to detect , Usually the detector mAP It will be much lower . Here are some key detector comparisons .

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ Data sets _21

FPN and Faster R-CNN *（ Use ResNet As a feature extractor ） With the highest accuracy （mAP @ [.5：.95]）.RetinaNet Use ResNet Builds on the FPN above . therefore ,RetinaNet The highest achieved mAP It is the effect of combining pyramid features , The complexity of the feature extractor and focal loss The combined impact of . however , Please note that , This is not a comparison between apple and apple （apple-to-apple comparison）. Later we will show Google survey , For better comparison . But it's best to check the declaration of each model first .

Takeaway so far

Single shot detectors When using a lower resolution image , Frames per second （FPS） Impressive , But at the cost of accuracy . These papers try to prove that they can defeat region based detectors （region based detectors） The accuracy of the . however , Since high-resolution images are usually used for such declarations , Therefore, the conclusion is poor . therefore , Their situation is changing . in addition , Different optimization techniques are applied , This makes it difficult to isolate the advantages of each model . actually ,single shot and region based detectors Now it is more and more similar in design and implementation . But some reservations , We can say ：

If you don't need real-time speed , Area based detector （ Such as Faster R-CNN） Will show a smaller accuracy advantage .
Single shot detectors It is used here for real-time processing . But the application needs to verify whether it meets its accuracy requirements .

Compare SSD MobileNet,YOLOv2,YOLO9000 and Faster R-CNN

The measured video has been uploaded to b standing 【30 Minutes 】https://www.bilibili.com/video/av75557343/

Report by Google Research (https://arxiv.org/pdf/1611.10012.pdf)

Google Research An investigation report is provided , Research Faster R-CNN,R-FCN and SSD The trade-off between speed and accuracy .（ This article does not cover YOLO.） It USES MS COCO Data sets are trained , Thus in TensorFLow These models are reimplemented in . It creates a more controlled environment , And make the trade-off comparison easier . It also introduced MobileNet, This technology can achieve high accuracy with low complexity .

Speed v.s. accuracy

The most important question is not which detector is the best . May not be able to answer . The real problem is , Which detector and which configuration can provide us with the best balance between speed and accuracy required by your application . The following is the accuracy and Speed tradeoff （ The time is in milliseconds ）.

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ Data sets _22

Usually ,Faster R-CNN More accurate , and R-FCN and SSD faster .

Use a 300 proposals Of Inception Resnet Conduct Faster R-CNN, In all test cases 1 FPS Provide the highest accuracy .
In the model for real-time processing ,MobileNet Upper SSD Having the highest mAP.

The chart also helps us find the best trading point , Return at a good speed .

Using residual networks (Residual Network) Of R-FCN The model achieves a good balance between accuracy and speed ,
If we were to proposals The quantity is limited to 50 individual , Then use Resnet Of Faster R-CNN Can achieve similar performance .

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ object detection _23

Feature extractor

This paper studies how the accuracy of the feature extractor affects the accuracy of the detector .Faster R-CNN and R-FCN Can take advantage of better feature extractors , But for the SSD It doesn't make much sense .

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ feature extraction _24

（x The axis is when each feature extractor classifies top 1％ The accuracy of .）

Target size

For large objects , Even with a simple extractor ,SSD The performance is also very good . Use a better extractor ,SSD It can even match the accuracy of other detectors . But compared with other methods ,SSD stay Small objects The performance on is much worse .

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ feature extraction _25

for example ,SSD There is a problem in detecting the bottles in the table below , Other methods can .

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ feature extraction _26

Input image resolution

Higher resolution can significantly improve the target detection ability of small objects , It can also help large objects . When the resolution is reduced by two times in two dimensions , The accuracy decreases on average 15.88％, But predict the time （inference time） Also decreased on average 27.4％.

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ feature extraction _27

Number of proposals

The generated proposals Quantity can significantly affect Faster R-CNN（FRCNN）, Without greatly reducing the accuracy . for example , Use Inception Resnet, Use 50 individual proposals instead of 300 individual proposals when ,Faster R-CNN Can increase the speed 3 times . Accuracy decreases only 4％. because R-FCN Each ROI The workload is much less , Therefore, the significance of speed improvement is far less important .

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ feature extraction _28

GPU Time

This is a different model using different feature extractors GPU Time .

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ feature extraction _29

Although many papers use FLOPS（ Number of floating point operations ） To measure complexity , But it does not necessarily reflect the accurate speed . Density of the model （sparse v.s. dense model） It will affect the time required . Here's the irony , Less dense models usually take longer on average to complete each floating-point operation . In the following illustration , Slope of most dense models （FLOPS and GPU ratio ） Greater than or equal to 1, The shallower model is smaller than 1. in other words , Even if the overall execution time is short , The effect of the model with smaller density is also poor . however , This reason has not been fully studied in this paper .

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ object detection _30

Memory

MobileNet With the smallest occupied space . It needs less than 1Gb（ total ） Of memory .

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ feature extraction _31

2016 year COCO Target detection competition

2016 year COCO The winning work of the target detection challenge is to use Resnet and Inception ResNet Five of Faster R-CNN A collection of models . It's in COCO On the tester mAP @ [.5,.95] achieve 41.3％, And significant improvements have been made in locating small objects .

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ object detection _32

Lessons learned

Google Research Some main findings of the paper ：

R-FCN and SSD The average speed of the model is faster , But if speed is not considered , Its accuracy cannot be defeated Faster R-CNN.
Faster R-CNN Each image needs at least 100 millisecond .
Only using low resolution feature map for detection will seriously affect the accuracy .
The input image resolution will seriously affect the accuracy . Reduce the width and height of the image by half , The average accuracy decreases 15.88％, But the average prediction time decreases 27.4％.
The choice of feature extractor will affect “Faster R-CNN” and “ R-FCN” The detection accuracy of , But yes SSD Is less dependent .
Post processing includes non maximum suppression （ Only in CPU Up operation ）, The running time for the fastest model is about 40 millisecond , This limits the speed to 25 FPS.
If you use only one IoU Calculation mAP, Then use [email protected]=0.75.
Use Inception ResNet When the network is used as a feature extractor , Use the stride 8 instead of 16 Can be mAP improve 5％, But the running time increases 63％.

Most accurate

The most accurate single model uses Inception ResNet Of Faster R-CNN and 300 A suggestion . Each image runs 1 second .
The most accurate model is the integrated model with multi crop prediction . It achieved the right 2016 year COCO Challenge the latest detection accuracy . It uses the average precision vector to select the five most different models .

One of the fastest

have MobileNet Of SSD It can provide the best accuracy compromise in the fastest detector .
SSD fast , But compared with other objects , Poor performance for small objects .
For large objects ,SSD Can be faster , A lighter extractor is better than Faster R-CNN and R-FCN.

There is a good balance between accuracy and speed

If we were to proposal The quantity is reduced to 50, be Faster R-CNN It can be done with R-FCN and SSD At a speed of 32mAP.

This paper is about AI Algorithm and image processing translate , If you want to reprint , You can scan the QR code below and reply to reprint . Due to twitter restrictions , Some pictures are not HD , If necessary, you can reply in the background “ 20191114” obtain , The naming has been changed in numerical order , Easy to use ！

object detection ： Speed and accuracy comparison （Fater R-CNN,R-FCN,SSD,FPN,RetinaNet and YOLOv3）_ Data sets _33

There are hot recommendations ????

Welcome to join the Group , At present, the directions of existing communication groups include ：AI Learning exchange group , object detection , Qiu Zhao helps each other , Data download and so on

原网站

版权声明
本文为[51CTO]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/209/202207281255102454.html