当前位置:网站首页>Target detection: speed and accuracy comparison (fater r-cnn, r-fcn, SSD, FPN, retinanet and yolov3)
Target detection: speed and accuracy comparison (fater r-cnn, r-fcn, SSD, FPN, retinanet and yolov3)
2022-07-28 14:12:00 【51CTO】
Click on the above “AI Algorithm and image processing ”
Heavy dry goods , First time delivery


Recommended articles 【 Click below to jump directly 】:
Source of the article :https://zhuanlan.zhihu.com/p/91719437
Authorized to reprint , For reprint, please contact the author
- Feature extractors (VGG16, ResNet, Inception, MobileNet).
- Output strides for the extractor.
- Input image resolutions.
- Matching strategy and IoU threshold (how predictions are excluded in calculating loss).
- Non-max suppression IoU threshold.
- Hard example mining ratio (positive v.s. negative anchor ratio).
- The number of proposals or predictions.
- Boundary box encoding.
- Data augmentation.
- Training dataset.
- Use of multi-scale images in training or testing (with cropping).
- Which feature map layer(s) for object detection.
- Localization loss function.
- Deep learning software platform used.
- Training configurations including batch size, input image resize, learning rate, and learning rate decay.
The worst part is , Technology is developing so fast , So that any comparison quickly becomes obsolete . ad locum , We summarize the results of each paper , So you can analyze and compare them completely . then , We according to the Google Research It concludes with a summary . By presenting multiple points of view in one situation , We hope we can better understand the performance indicators .
Performance results
In this section , We summarize the performance of the corresponding paper report . Feel free to quickly browse this section .
Faster R-CNN(https://arxiv.org/pdf/1506.01497.pdf)
This is a PASCAL VOC 2012 Test set results . We are right to represent Faster R-CNN The last performance 3 Interested in . The second column represents RPN Formulated by the network RoI Number . The third column represents the training data set used . The fourth column is the average accuracy of the measurement accuracy (mAP).
mAP:https://medium.com/@jonathan_hui/map-mean-average-precision-for-object-detection-45c121a31173
PASCAL VOC 2012 Test set results

VOC 2012 for Faster R-CNN
MS COCO The result on

COCO for Faster R-CNN
Use PASCAL VOC 2007 Test set in K40 GPU Time up , In Milliseconds .

R-FCN (https://arxiv.org/pdf/1605.06409.pdf)
PASCAL VOC 2012 Test set results

VOC 2012 for R-FCN
( Multi scale training and testing were used for some results .)
MS COCO The result on

COCO for R-FCN
SSD (https://arxiv.org/pdf/1512.02325.pdf)
This is the use of 300×300 and 512×512 Input the PASCAL VOC 2007、2012 and MS COCO Result .

SSD
(SSD300 * and SSD512 * Apply data enhancements to small objects to improve mAP.)
performance :

Speed is measure with a batch size of 1 or 8 during inference
( Here YOLO It means more than YOLOv2 or YOLOv3 Slow v1)
MS COCO Result :

COCO for SSD
YOLO(https://arxiv.org/pdf/1612.08242.pdf)
PASCAL VOC 2007 Test set results .

VOC 2007 for YOLOv2
( We added... Here VOC 2007 test , Because it has results for different image resolutions .)
PASCAL VOC 2012 Test set results .

VOC 2012 for YOLOv2
MS COCO The result on .

COCO for YOLOv2
YOLOv3 (https://pjreddie.com/media/files/papers/YOLOv3.pdf)
MS COCO The result on

COCO for YOLOv3
YOLOv3 Performance of

YOCO2 stay COCO Performance on
FPN (https://arxiv.org/pdf/1612.03144.pdf)
MS COCO The result on .

COCO for FPN
RetinaNet (https://arxiv.org/pdf/1708.02002.pdf)
MS COCO The result on

COCO for RetinaNet
MS COCO Test the speed of development (ms) And accuracy (AP).

COCO for RetinaNet
Compare the paper results
The result of comparing different papers side by side is Unwise . These experiments were done under different settings . For all that , We decided to draw them together , So that you can at least have a general understanding of their general location . But notice , We should never compare these figures directly .
For the results presented below , Use PASCAL VOC 2007 and 2012 The data train the model .mAP It's using PASCAL VOC 2012 Measured by the tester . about SSD, The chart shows 300×300 and 512×512 Enter the result of the image . about YOLO, The result is 288×288、416×461 and 544×544 Images . High resolution images of the same model have better mAP, But the processing speed is slow .

* Indicates that small target data enhancement is applied .
** Indicates that the result is based on VOC 2007 Measured by the test set . The reason for including these contents , Because YOLO There are not many papers VOC 2012 test result . because VOC 2007 The result is generally better than 2012 Better year , So we added R-FCN VOC 2007 As a cross reference ( cross reference).
Input image resolution and feature extractor will affect the speed . The following are the highest and lowest reports of the corresponding papers FPS. however , The following results may have great deviation , Especially in different mAP Measure under .

COCO The result on the dataset
In recent years , Many results are using COCO The target detection data set is specially measured .COCO Data sets are difficult to detect , Usually the detector mAP It will be much lower . Here are some key detector comparisons .

FPN and Faster R-CNN *( Use ResNet As a feature extractor ) With the highest accuracy (mAP @ [.5:.95]).RetinaNet Use ResNet Builds on the FPN above . therefore ,RetinaNet The highest achieved mAP It is the effect of combining pyramid features , The complexity of the feature extractor and focal loss The combined impact of . however , Please note that , This is not a comparison between apple and apple (apple-to-apple comparison). Later we will show Google survey , For better comparison . But it's best to check the declaration of each model first .
Takeaway so far
Single shot detectors When using a lower resolution image , Frames per second (FPS) Impressive , But at the cost of accuracy . These papers try to prove that they can defeat region based detectors (region based detectors) The accuracy of the . however , Since high-resolution images are usually used for such declarations , Therefore, the conclusion is poor . therefore , Their situation is changing . in addition , Different optimization techniques are applied , This makes it difficult to isolate the advantages of each model . actually ,single shot and region based detectors Now it is more and more similar in design and implementation . But some reservations , We can say :
- If you don't need real-time speed , Area based detector ( Such as Faster R-CNN) Will show a smaller accuracy advantage .
- Single shot detectors It is used here for real-time processing . But the application needs to verify whether it meets its accuracy requirements .
Compare SSD MobileNet,YOLOv2,YOLO9000 and Faster R-CNN
The measured video has been uploaded to b standing 【30 Minutes 】https://www.bilibili.com/video/av75557343/
Report by Google Research (https://arxiv.org/pdf/1611.10012.pdf)
Google Research An investigation report is provided , Research Faster R-CNN,R-FCN and SSD The trade-off between speed and accuracy .( This article does not cover YOLO.) It USES MS COCO Data sets are trained , Thus in TensorFLow These models are reimplemented in . It creates a more controlled environment , And make the trade-off comparison easier . It also introduced MobileNet, This technology can achieve high accuracy with low complexity .
Speed v.s. accuracy
The most important question is not which detector is the best . May not be able to answer . The real problem is , Which detector and which configuration can provide us with the best balance between speed and accuracy required by your application . The following is the accuracy and Speed tradeoff ( The time is in milliseconds ).

Usually ,Faster R-CNN More accurate , and R-FCN and SSD faster .
- Use a 300 proposals Of Inception Resnet Conduct Faster R-CNN, In all test cases 1 FPS Provide the highest accuracy .
- In the model for real-time processing ,MobileNet Upper SSD Having the highest mAP.
The chart also helps us find the best trading point , Return at a good speed .
- Using residual networks (Residual Network) Of R-FCN The model achieves a good balance between accuracy and speed ,
- If we were to proposals The quantity is limited to 50 individual , Then use Resnet Of Faster R-CNN Can achieve similar performance .

Feature extractor
This paper studies how the accuracy of the feature extractor affects the accuracy of the detector .Faster R-CNN and R-FCN Can take advantage of better feature extractors , But for the SSD It doesn't make much sense .

(x The axis is when each feature extractor classifies top 1% The accuracy of .)
Target size
For large objects , Even with a simple extractor ,SSD The performance is also very good . Use a better extractor ,SSD It can even match the accuracy of other detectors . But compared with other methods ,SSD stay Small objects The performance on is much worse .

for example ,SSD There is a problem in detecting the bottles in the table below , Other methods can .

Input image resolution
Higher resolution can significantly improve the target detection ability of small objects , It can also help large objects . When the resolution is reduced by two times in two dimensions , The accuracy decreases on average 15.88%, But predict the time (inference time) Also decreased on average 27.4%.

Number of proposals
The generated proposals Quantity can significantly affect Faster R-CNN(FRCNN), Without greatly reducing the accuracy . for example , Use Inception Resnet, Use 50 individual proposals instead of 300 individual proposals when ,Faster R-CNN Can increase the speed 3 times . Accuracy decreases only 4%. because R-FCN Each ROI The workload is much less , Therefore, the significance of speed improvement is far less important .

GPU Time
This is a different model using different feature extractors GPU Time .

Although many papers use FLOPS( Number of floating point operations ) To measure complexity , But it does not necessarily reflect the accurate speed . Density of the model (sparse v.s. dense model) It will affect the time required . Here's the irony , Less dense models usually take longer on average to complete each floating-point operation . In the following illustration , Slope of most dense models (FLOPS and GPU ratio ) Greater than or equal to 1, The shallower model is smaller than 1. in other words , Even if the overall execution time is short , The effect of the model with smaller density is also poor . however , This reason has not been fully studied in this paper .

Memory
MobileNet With the smallest occupied space . It needs less than 1Gb( total ) Of memory .

2016 year COCO Target detection competition
2016 year COCO The winning work of the target detection challenge is to use Resnet and Inception ResNet Five of Faster R-CNN A collection of models . It's in COCO On the tester mAP @ [.5,.95] achieve 41.3%, And significant improvements have been made in locating small objects .

Lessons learned
Google Research Some main findings of the paper :
- R-FCN and SSD The average speed of the model is faster , But if speed is not considered , Its accuracy cannot be defeated Faster R-CNN.
- Faster R-CNN Each image needs at least 100 millisecond .
- Only using low resolution feature map for detection will seriously affect the accuracy .
- The input image resolution will seriously affect the accuracy . Reduce the width and height of the image by half , The average accuracy decreases 15.88%, But the average prediction time decreases 27.4%.
- The choice of feature extractor will affect “Faster R-CNN” and “ R-FCN” The detection accuracy of , But yes SSD Is less dependent .
- Post processing includes non maximum suppression ( Only in CPU Up operation ), The running time for the fastest model is about 40 millisecond , This limits the speed to 25 FPS.
- If you use only one IoU Calculation mAP, Then use [email protected]=0.75.
- Use Inception ResNet When the network is used as a feature extractor , Use the stride 8 instead of 16 Can be mAP improve 5%, But the running time increases 63%.
Most accurate
- The most accurate single model uses Inception ResNet Of Faster R-CNN and 300 A suggestion . Each image runs 1 second .
- The most accurate model is the integrated model with multi crop prediction . It achieved the right 2016 year COCO Challenge the latest detection accuracy . It uses the average precision vector to select the five most different models .
One of the fastest
- have MobileNet Of SSD It can provide the best accuracy compromise in the fastest detector .
- SSD fast , But compared with other objects , Poor performance for small objects .
- For large objects ,SSD Can be faster , A lighter extractor is better than Faster R-CNN and R-FCN.
There is a good balance between accuracy and speed
- If we were to proposal The quantity is reduced to 50, be Faster R-CNN It can be done with R-FCN and SSD At a speed of 32mAP.
This paper is about AI Algorithm and image processing translate , If you want to reprint , You can scan the QR code below and reply to reprint . Due to twitter restrictions , Some pictures are not HD , If necessary, you can reply in the background “ 20191114” obtain , The naming has been changed in numerical order , Easy to use !

There are hot recommendations ????
Welcome to join the Group , At present, the directions of existing communication groups include :AI Learning exchange group , object detection , Qiu Zhao helps each other , Data download and so on
边栏推荐
- 你真的了解esModule吗
- LeetCode 0143. 重排链表
- Understanding of stack and practical application scenarios
- Security assurance is based on software life cycle -istio authentication mechanism
- Vite configuring path aliases in the project
- 【Utils】ServletUtil
- Understand BFC features and easily realize adaptive layout
- Qt5开发从入门到精通——第一篇概述
- Multithreading and high concurrency (III) -- source code analysis AQS principle
- jenkins
猜你喜欢

Machine learning (Zhou Zhihua) Chapter 6 notes on Support Vector Learning

RSA用私钥加密数据公钥解密数据(不是签名验证过程)

Istio IV fault injection and link tracking

7. Dependency injection

Read how to deploy highly available k3s with external database

文献阅读(245)Roller

第六章 支持向量机

Redis sentinel mechanism

什么是自旋锁 自旋锁是指当一个线程尝试获取某个锁时,如果该锁已被其他线程占用,就一直循环检测锁是否被释放,而不是进入线程挂起或睡眠状态。 /** * 为什么用自旋锁:多个线程对同一个变量

阿里、京东、抖音:把云推向产业心脏
随机推荐
Implementation of StrCmp, strstr, memcpy, memmove
regular expression
QT自制软键盘 最完美、最简单、跟自带虚拟键盘一样
Websocket chat
Postgresql14 installation and master-slave configuration
qml 图片预览
【LVGL事件(Events)】事件代码
离散对数问题(DLP) && Diffie-Hellman问题(DHP)
你真的了解esModule吗
Istio IV fault injection and link tracking
VOS3000如何呼入送到OKCC
[utils] fastdfs tool class
Poj3275 ranking the cows
Graph traversal (BFS & DFS basis)
[lvgl events] Application of events on different components (I)
解决跨越的几种方案
7. Dependency injection
Deploy application delivery services in kubernetes (Part 1)
线程阻塞的三种情况。
什么是自旋锁 自旋锁是指当一个线程尝试获取某个锁时,如果该锁已被其他线程占用,就一直循环检测锁是否被释放,而不是进入线程挂起或睡眠状态。 /** * 为什么用自旋锁:多个线程对同一个变量