当前位置:网站首页>Important concepts of target detection - IOU, receptive field, hole convolution, mAP
Important concepts of target detection - IOU, receptive field, hole convolution, mAP
2022-08-02 06:32:00 【The romance of cherry blossoms】
1.Calculation of IOU
The performance of the target detection algorithm can be judged by comparing the distance and overlapping area between the ground-truth bounding box and the predicted bounding box.IoU(Intersection over Union) is a criterion for the accuracy of detecting corresponding objects in a specific data set, which can achieve the above purpose.The calculation logic of IoU is very simple, that is, the area of the intersection of two boxes
divided by the area of the combination of the two boxes, as shown in the figure.Where does the so-called truth box come from?For any training dataset or test dataset, there are at least two types of data: one is the image itself; the other contains the coordinates of the object to be detected.Our goal is to build a target detection algorithm and test it on the test set. If the value of IoU is greater than 0.5, the test effect is generally good; if the value of IoU is less than 0.5, then it is obvious that the target detectionThe effect is not ideal.If the value of IoU is around 0.7, it means that the position of the ground truth frame and the prediction frame are very close; if the value of IoU is around 0.9, it means that the target detection algorithm is very effective on the test set.

2. Hole convolution
Common image segmentation algorithms usually use pooling layers and convolutional layers to increase the receptive field, and also reduce the size of the feature map, and then use upsampling to restore the image to the fullest size.Because the process of shrinking and re-enlarging the feature map causes loss of accuracy, it is necessary to keep the size of the feature map unchanged while increasing the receptive field, so as to replace the down-sampling and up-sampling operations.Atrous convolution was born.
Dilated/Atrous Convolation (can be called hole convolution or dilated convolution in Chinese) is to inject holes into the standard convolution map to increase the receptive field, compared with the original normalConvolution, atrous convolution adds an additional hyperparameter: dilation rate, which defines the spacing of each value when the convolution kernel processes the data (for example, the standard convolution dilation rate is 1).

Hollow convolution has the following two functions.
1) Expand the receptive field: In order to increase the receptive field and reduce the amount of computation in the deep network, downsampling (such as using a pooling layer) is always performed, although this can increase the receptive field, but also reduces the spatial resolution.Using atrous convolution can expand the receptive field without losing resolution as much as possible, which is especially important in detection or segmentation tasks.On the one hand, a larger receptive field can detect large targets, and on the other hand, a higher resolution can precisely locate the target.
2) Capture multi-scale context information: When different dilation rates are set, the receptive field will be different, and multi-degree information can be obtained.Multi-scale information is very important in vision tasks.
3.mAP
Suppose the classifier has 90% precision at 10% recall, but 96% precision at 20% recall.There's really no tradeoff here: It's more reasonable to use a classifier with 20% recall (instead of 10%) because you'll get higher recall and higher precision.Therefore, we should not be looking at 10% recall, but at the maximum precision that the classifier can provide at least 10% recall.This is 96%, not 90%.So one way to get a reasonable idea about the performance of the model is to calculate the maximum precision that can be achieved when the recall is at least 0% (then 10%, 20%, and so on, up to 100%) and then calculateThe average of these maximum accuracies.This is called the Average Precision (AP) metric.When there are more than two classes, we can compute AP for each class and then compute the mean AP (mAP).
In an object detection system, there is another layer of complexity: what if the system detects the correct class but in the wrong location (i.e. the bounding box has no objects at all)?Of course, we should not take this as a positive forecast.The second way is to define an IOU threshold: for example, we can consider the prediction to be correct only if the IOU is too large for 05 and the predicted class is correct.The corresponding mAP is usually labeled as [email protected] (or [email protected]%, or sometimes AP50).This is done in some competitions (such as the PASCAL VOC Challenge).In other cases (such as COCO competition), mAP is calculated for different IOU readings (0.50, 0.55, 0.60, ..., 0.95), and the final metric is the average of all these mAPs (denoted as AP @[ .50:.95] or [email protected][.50:0.05:.95]).This is the average of the average.
边栏推荐
- 关于鸿蒙系统 JS UI 框架源码的分析
- goroutine (coroutine) in go language
- 上海交大牵手淘宝成立媒体计算实验室:推动视频超分等关键技术发展
- TikTok平台的两种账户有什么区别?
- The Go language learning notes - dealing with timeout - use the language from scratch from Context
- CPU使用率和负载区别及分析
- Mysql common commands
- Introduction to Grid Layout
- LeetCode刷题系列 -- 10. 正则表达式匹配
- Mysql implements optimistic locking
猜你喜欢
随机推荐
分布式文件存储服务器之Minio对象存储技术参考指南
回文串求解的进阶方法
关于 VS Code 优化启动性能的实践
Review: image saturation calculation formula and image signal-to-noise (PSNR) ratio calculation formula
5年在职经验之谈:2年功能测试、3年自动化测试,从入门到不可自拔...
【C语言】LeetCode26.删除有序数组中的重复项&&LeetCode88.合并两个有序数组
classSR论文阅读笔记
18 years of programmer career, read more than 200 programming books, pick out some essence to share with you
Alluxio为Presto赋能跨云的自助服务能力
Cyber Security Learning - Intranet Penetration 4
Constructors, member variables, local variables
How H5 realizes evoking APP
Meta公司内部项目-RaptorX:将Presto性能提升10倍
swinIR论文阅读笔记
为什么4个字节的float要比8个字节的long大呢?
MySql copies data from one table to another table
上海交大牵手淘宝成立媒体计算实验室:推动视频超分等关键技术发展
腾讯大咖分享 | 腾讯Alluxio(DOP)在金融场景的落地与优化实践
Use the browser's local storage to realize the function of remembering the user name
双重for循环案例(用js打印九九乘法表)
![[PSQL] window function, GROUPING operator](/img/95/5c9dc06539330db907d22f84544370.png)








