当前位置:网站首页>[Paper Intensive Reading] The relationship between Precision-Recall and ROC curves
[Paper Intensive Reading] The relationship between Precision-Recall and ROC curves
2022-08-05 06:10:00 【takedachia】
A recent refresher on some basics of machine learning,Intensive reading an article about speaking ROC曲线的PRarticle on curvilinear relationship.
文章的title为《The relationship between Precision-Recall and ROC curves》发表于2006年,The author is from the University of Wisconsin-MadisonJesse Davis和Mark Goadrich.Although published early,But knowledge about the cornerstones of machine learning never goes out of style.
Share my reading notes and experiences below.
- 1 The perspective of the map 混淆矩阵 与 ROC和PR曲线
- 2 ROC曲线与PRThe curve can be transformed equivalently,One-to-one correspondence between the points of the two curves
- 3 ROC空间中,曲线ABetter than that of the curveB ⇔ PR空间中,曲线ABetter than that of the curveB
- 4 构建Convex Hull(凸包)Draw the maximum achievableROC曲线
- 5 单纯优化ROCArea under the curve cannot be optimizedPR曲线下面积
- 总结
1 The perspective of the map 混淆矩阵 与 ROC和PR曲线
简述 混淆矩阵
When we evaluate the performance of a binary classification model,Some evaluation metrics will be used,to evaluate its performance on the test set.
The first thing that comes to mind is accuracyAccuracy,It is the proportion of the total number of correct classification.If we want to know more details,Such as a problem in clinical medicine,Our screening model will pay attention to the test positivity rate,unwilling to let go of any positive patient,then we will pay attentionRecall(又称查全率、敏感性).
我们知道Recall = TP / (TP + FN),TP为真阳性,FN为假阴性.
A classification result for each binary model on a given test set,Can be represented as a2×2的表,称为混淆矩阵(Confusion Matrix),The four indicators are:TP(真阳性)、FN(假阴性)、FP(假阳性)、TN(真阴性).如下图.
可以关注一下(a)The position of each metric in the confusion matrix.The first column in the figure represents the prediction of true positives,The second column shows the true negative predictions.
当然,在不同文献中,矩阵中actual和predict、positive和nagativeoften swap positions,但是是一样的.
These four values in the matrix are often used to represent Evaluation Metrics for Classification Results,常见的如图(b),包括Recall(查全率、敏感性)、Precision(查准率)、True Positive Rate(真阳性率)、False Positive Rate(假阳性率).
其中Recall=True Positive Rate.
ROC、PR曲线 是 混淆矩阵 to the set of points mapped in their respective spaces
We know a binary classification model A classification result on a given test set can give a 混淆矩阵.
The output of the binary classification model can usually be expressed as the predicted probability of the positive class,比如输出 [0.28, 0.72],This vector represents the probability of the first class as0.3,第二类(Set to class)概率为0.7.我们默认把>0.5The probability is expressed as the prediction for the current class,这个0.5就是一个阈值(threshold).
当阈值为0.5时,When we classify, we will get a classification result,is a confusion matrix.当阈值为0.7时,Again we get a confusion matrix.当阈值为0.8Then you will get a confusion matrix(此时0.72<0.8,will be predicted as negative).
Confusion matrix is different,False Positive Rate、True Positive Rate、Recall、Precisionwill change accordingly.
我们把False Positive Rate、True Positive RateThis pair of indicators is set as the horizontal and vertical coordinates in the two-dimensional coordinate system,In this coordinate system, differentConfusion Matrix Corresponding Points标上去.Countless dots make upROC曲线.
如下图(a).If there are two different classification model,can draw their ownROC曲线.
同理,我们把Recall、PrecisionThis pair of indicators is set as the horizontal and vertical coordinates in the two-dimensional coordinate system,Similarly, in this coordinate system, differentConfusion Matrix Corresponding Points标上去.Countless such points constitutePR曲线.如上图(b),Different models will have differentPR曲线.
这里,我自己将 混淆矩阵 和 两条point in curve space The corresponding rule of函数,Different confusion matrices are mapped separatelyROC空间和PRdifferent points in space.
如图,The result of a binary classifier is represented as a confusion matrix A A A.
f 1 ( A ) f_1(A) f1(A)表示A映射到ROC空间中的一个点 ( F P R , T P R ) (FPR, TPR) (FPR,TPR).
f 2 ( A ) f_2(A) f2(A)表示A映射PR空间中的一个点 ( R e c a l l , P r e c i s i o n ) (Recall, Precision) (Recall,Precision).
Ato a point in space is a one-to-one mapping,All points form a curve in their respective space,即ROC曲线和PR曲线.(The descriptions here help to understand theROC曲线和PRProof of one-to-one correspondence between curves)
同时,The author mentions in the text that it is also possible to defineFPR、TPR、Recall、Precision为关于A的函数,譬如:
R E C A L L ( A ) = T P / ( T P + F N ) RECALL(A) = TP / (TP + FN) RECALL(A)=TP/(TP+FN).
为什么PR曲线比ROCThe curve can better reflect the information of unbalanced class samples
An ideal binary classifier,Can classify a dataset perfectly,Its confusion matrix would be something like this(Note that this is a class imbalanced dataset),Only positive diagonals have values:
Prediction Positive | Prediction Negative | |
Actual Positive | 47 (TP) | 0 (FN) |
Actual Negative | 0 (FP) | 433 (TN) |
ROC的横坐标FPRThis is the second lineFPthan the current line,纵坐标TPRis in the first lineTPthan the current line.
PR的横坐标Recall是TPthan this line,纵坐标Precision是TPthan this column.
那么ROCThe curve will be infinitely drawn to the upper left corner,PRThe curve will be infinitely pulled to the upper right corner.
但是,In reality this is unlikely,A general confusion matrix would look like this:
Prediction Positive | Prediction Negative | |
Actual Positive | 46 (TP) | 1 (FN) |
Actual Negative | 33 (FP) | 400 (TN) |
in this confusion matrix,we see false positivesFPcompared to false negativesFN比较多(such as cancer screening models,We hope to find more positives first,It doesn't matter if you catch some negative ones by mistake).
在ROC中,FPR和TPRstill close to1,The point is still close to the upper left corner.
但是在PR曲线中,Recall还是很漂亮,但是Precision=46/79,much lower,Click to the position below:
We know this is a class imbalanced data(Imbalanced Data),for the minority class,本身FP(或FN)increase will causePrecision(或Recall)的急剧下降.
对于PR曲线来说,横纵坐标Recall和PrecisionThey are concerned about the positive category(少数类)的分类性能,The focus is on a certain class.(You can also focus on the majority class,Set the majority class as the positive class,画出PR曲线,但是意义不大)
而对于ROC曲线,横纵坐标FPR和TPR关注的是各自Classification efficiency of categories,So naturally it is not sensitive to class imbalance data..但相对的,ROCCan focus on two classes at the same time,It reflects the comprehensive classification performance of the classifier.
因此,在实际问题中,ROCThe curve can first reflect the comprehensive performance of the classifier,And when we need to focus on the classification performance of a certain category(because of class imbalance,such as cancer screening)时,需要用PR曲线来观察.
当然,Performance metrics for binary classifiers,还有诸如F1-scoreCan also measure the category such as unbalanced data classification performance.
2 ROC曲线与PRThe curve can be transformed equivalently,One-to-one correspondence between the points of the two curves
We know that for a given data set,ROC曲线和PRThe curve can reflect the classification performance of the classifier,And the performance information displayed by the two has their own emphasis.,PRCurve can show more unbalanced category classification performance.我们就会思考一个问题,The two curves seem to provide different information,ROC和PRare two different things?
The answer is that the two curves are in one-to-one correspondence,The author gives a theorem in the text:
Given a binary classification dataset,(如果 Recall ≠ 0)ROCCurves in space and PRcurves in space There is a one-to-one correspondence between,Make the curves contain the exact same confusion matrix.
证明(Here I myself prove that the description is slightly different from the author,本质是一样的):
Let's go back to the previous picture:
The result of a binary classifier is represented as a confusion matrix A A A,它有4个变量:TP、FP、FN、TN.Given the dataset,The sum of the four variables, that is, the number of samples is determined,So the degrees of freedom of the matrix are3.
f 1 ( A ) f_1(A) f1(A)表示A映射到ROC空间中的一个点 ( F P R , T P R ) (FPR, TPR) (FPR,TPR),Observe the formula on the graph,You can see only coordinates3个变量:FP、TN、TP.第4Although the variable does not appear,But the only sure, n − ( F P + T N + T P ) n-(FP+TN+TP) n−(FP+TN+TP).因此 f 1 f_1 f1Mapping is a one-to-one mapping,is called bijection in set theory(单射+满射).
f 2 ( A ) f_2(A) f2(A)同理,其表示A映射PR空间中的一个点 ( R e c a l l , P r e c i s i o n ) (Recall, Precision) (Recall,Precision),is also a degree of freedom3的输出. f 2 f_2 f2Mapping is also a one-to-one mapping,为双射.
Because it is a one-to-one mapping,Points in both spaces are represented by a uniquely determined confusion matrix,所以ROC空间和PROne-to-one correspondence between points in space,The curves contain the exact same confusion matrix.
最后,如果Recall=0即TP=0,will not restore the denominatorFN、FP的值,Unable to inverse solve the confusion matrixAthe four variables of,因此Recall不能为0.
What does this theorem do??
我觉得第一,it tells us a model'sROC曲线和PRCurves contain exactly the same information,It just focuses on showing the classification performance of different categories,PRCurves are not beautiful does not meanROCThe curve doesn't work,Its comprehensive classification performance(如准确率Accuracy)还是可以一看的.
第二,It finds the optimal solution for the discussion discussed later in the paperROC、PRThe feasibility of the curve lays the theoretical foundation.
3 ROC空间中,曲线ABetter than that of the curveB ⇔ PR空间中,曲线ABetter than that of the curveB
小标题中,曲线ABetter than that of the curveB,What does superior mean?
used in the originaldominate的概念,曲线Adomination curveB,mean curveAalways on the curveB的上方(Allow partial fit).
在ROC空间中,曲线A dominate 曲线B,that means the curveAThe representative classifier model outperforms the curveB代表的.在PR空间中,也是一样的道理.We can understand this phenomenon simply by increasing or decreasing the area under the curve.
①在ROC空间,曲线A在曲线B之上 ⇨ 在PR空间,曲线B在曲线A之上
②在PR空间,曲线A在曲线B之上 ⇨ 在ROC空间,曲线B在曲线A之上
We are the first proposition①,According to the ideas of the author in the paper,I proved it myself by proof by contradiction,证明过程如下:
命题②It can also be proved by contradictory method in the same way.
We see that the proof process uses the2Theorem in Section,logically linking the past and the future.
So here's another question,Why does the author propose dominant(dominate)这个概念呢?Is it simply the area under the lift curve??
4 构建Convex Hull(凸包)Draw the maximum achievableROC曲线
这里,The author in this article the research many are inherited another article《Realisable Classifiers: Improving Operating Performance on Variable Cost Problems.》(Scott,1998)的思想.在那篇文章中,作者提出了Convex Hull(凸包)的概念,通过对ROCDrawn convex hull of already points in space,可得到一条MRROC(Maximum achievable curve).
The maximum achievable curve can be understood as an optimal curve(optimal curve),It can enlighten designers of classifier algorithms on better modeling,Indicate whether our current classification algorithm is suitable or not.
in the original text of that article,作者谈到:Given a classification algorithm(如线性模型)and it generates aROC曲线,There will be an envelope for thisROCCurve, constitute the convex hull.This convex hull represents a set of realizable classifiers,They are always better than this linear model,and they are created by subclasses of the initial classifier.given some existing classifiers,The constructed convex hull describes a maximum achievableROC.
凸包(Convex Hull)with the maximum achievable curve
首先,什么是凸包 Convex Hull?
We know that in normal times,What we get from the actual modeling prediction dataROCThe curve is generally stepped-like,如:
If we have been exposed to convex optimization theory,I know this curve is not“凸”,怎么构造“凸”呢?
Abandon the definition in the text,I use the Internet a better picture of the great god,He likens building a convex hull to pulling a rubber band.
This connects the bottom left and top right points with the red rubber band,Release from the upper left corner,The convex hull is the first3The state of the picture,The convex hull is the set of points on this rubber band.All points on the rubber band are located in the existingROC点(线)的上方.
This is also the textdominate这一概念的由来,This convex hull is a maximum achievableROC曲线,它在这个ROCdominance in space(dominate).
《Realisable Classifiers: Improving Operating Performance on Variable Cost Problems.》This image from the article describes multiple classifiersROC曲线下,Draw the maximum achievable by constructing the convex hullROC曲线 的实现:
构建 可实现的PR曲线
Based on the maximum achievable we draw by constructing the convex hullROC曲线,The author makes a corollary in the text:
在PRexist in space 可实现的PR曲线(achievable P-R curve),It can be based on existing classifiersPR曲线构建,且achievable P-R curve在PRdominance in space(dominate).
The proof method uses the previous2Section Theorems and Sections3节的结论,这里不详述.
可实现的PRThe construction of the curve is not as simple as pulling a rubber band,可先在ROCBuilding a convex hull in space,Convert the point.
The second half of the article discusses the 可实现的PR曲线 的构建方法,and the relationship to the area under the curve.
5 单纯优化ROCArea under the curve cannot be optimizedPR曲线下面积
We know that by building MRROC 和 achievable P-R curve to construct the optimal curve.
同时我们也知道ROC曲线下面积(AUC-ROC)在构建MRROCwill be optimized in(After pulling the rubber band,面积大了).
那我们想,Simply by algorithmically optimizing AUC-ROC,will be optimized accordingly PRarea under the curve??
答案是否定的,The author cites a more extreme example in the text:
There are two on the leftROC曲线,曲线Ⅰ的AUC=0.813,曲线Ⅱ的AUC=0.875,曲线Ⅰ面积<曲线Ⅱ面积.
但是对应到PR空间,曲线Ⅰ和曲线Ⅱlooks like the one shown on the right.曲线Ⅰ的PRThe area of the curve becomes very small.
Also from another angle,在ROC空间中,曲线Ⅱno domination curveⅠ(相交),It's just a bit bigger,不能说在PRThere is an inevitable relationship between their sizes in space.
- 对于任何一个数据集,一个分类器(模型)的ROC曲线和PRThe curve contains the same points,are equivalently convertible between them.Just because the measurements of the horizontal and vertical coordinates in the respective spaces are different,Each shows the classification performance of different categories of the classifier.
- 由上可推出,when a curve isROCdominance in space(dominate)的时候,在PRalso dominates in space,反之亦然.
- 可以在ROCConstruct an optimal curve in space,The method is to construct the convex hull of the existing classifier points(Convex Hull),Can be understood as pulling a rubber band.Transform the points of this convex hull toPR空间,another optimal curve,称 可实现的PR曲线.
optimal curve heuristic Algorithm designers understand the problem of current classifiers,Design better models. - 仅优化ROCArea under the curve is not guaranteed to be optimizedPR曲线下面积.
《The relationship between Precision-Recall and ROC curves》Jesse Davis, Mark Goadrich. international conference on machine learning. Jun 2006
《Realisable Classifiers: Improving Operating Performance on Variable Cost Problems.》 Martin J.J.Scott, Mahesan Niranjan, Richard W. Prager. british machine vision conference. Jan 1998
URP渲染管线实战教程系列 之URP渲染管线实战解密(一)
Getting Started 04 When a task depends on another task, it needs to be executed in sequence
Cocos Creator开发中的事件响应
Unity常用模块设计 : Unity游戏排行榜的制作与优化
Getting Started Documentation 12 webserve + Hot Updates
dsf5.0 弹框点确定没有返回值的问题
错误类型:reflection.ReflectionException: Could not set property ‘xxx‘ of ‘class ‘xxx‘ with value ‘xxx‘
入门文档08 条件插件
Unity中的GetEnumerator 方法及MoveNext、Reset方法
Unity huatuo 革命性热更系列1.3 huatuo示例项目源码分析与启发
入门文档08 条件插件
PVE 直通硬盘到TrueNAS
Image compression failure problem
入门文档12 webserve + 热更新
Introductory document 05-2 use return instructions the current task has been completed
OpenCV3.0 兼容VS2010与VS2013的问题