当前位置：网站首页>Deep understanding of P-R curve, ROC and AUC

Deep understanding of P-R curve, ROC and AUC

2022-07-02 12:07:00 【raelum】

Catalog

One 、 Mean square error 、 Accuracy and error rate
Two 、 Precision rate 、 Recall and $F 1$
3、 ... and 、ROC And AUC
- 3.1 ROC（Receiver Operating Characteristic）
- 3.2 AUC（Area Under roc Curve）

One 、 Mean square error 、 Accuracy and error rate

Evaluate the generalization performance of the model , We need to have evaluation criteria to measure the generalization ability of the model , This is it. Performance metrics （Performance Measure）.

When evaluating the generalization ability of the same model , Using different performance measures often leads to different evaluation results , This means that the model “ Good or bad ” Is relative . What kind of model is good , It's not just about algorithms and data , It also depends on performance metrics .

In the prediction task , Given size is $m$ Data set of

$D=\{(\boldsymbol{x}_1,y_1),\cdots,(\boldsymbol{x}_m,y_m)\}$

among $y_i$ yes $\boldsymbol{x}_i$ The true mark of . To evaluate the model $f$ Performance of , We need to put the prediction results $f(\boldsymbol{x})$ With real markers $y$ Compare .

There are three simplest performance measures ：

Mean square error （MSE）： $\displaystyle mse(f;D)=\frac1m \sum_{i=1}^m (f(\boldsymbol{x}_i)-y_i)^2$ ;
precision （Accuracy）： $\displaystyle acc(f;D)=\frac1m \sum_{i=1}^m \mathbb{I}(f(\boldsymbol{x}_i)=y_i)$ ;
Error rate （Error）： $\displaystyle err(f;D)=\frac1m \sum_{i=1}^m \mathbb{I}(f(\boldsymbol{x}_i)\neq y_i)$ .

among $\mathbb{I}(\cdot)$ Is an indicator function , And the accuracy and error rate meet the following relationship

$a c c (f; D) + e r r (f; D) = 1$

Mean square error is often used in regression tasks , Accuracy and error rate are often used in classification tasks .

sklearn.metrics Common performance metrics are provided in , Mean square error 、 The accuracy and error rate are as follows ：

"""  Mean square error  """
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y_true, y_pred)

"""  precision  """
from sklearn.metrics import accuracy_score
acc = accuracy_score(y_true, y_pred)

"""  Error rate  """
from sklearn.metrics import accuracy_score
err = 1 - accuracy_score(y_true, y_pred)

Two 、 Precision rate 、 Recall and $F 1$

2.1 Precision rate （Precision） And recall （Recall）

Error rate and precision are commonly used , But it can't meet all the task needs .

Consider such a scenario , Suppose we trained a model that can judge good melon or bad melon according to watermelon data set . Now there's a new load of watermelon , We use the trained model to distinguish these watermelons , Naturally , The error rate measures how much of the melon is judged wrong .

But if what we care about is ：

Pick out the melon （ The good melon judged by the model ） How many proportions are really good melons .
How many of all the really good melons have been singled out （ The model is judged as good ）.

Then the error rate is obviously not enough , Therefore, it is necessary to introduce new performance measures .

The above words seem a little tongue twister , Next, let's explain it vividly with a few more pictures .

Suppose the melon farmer pulls a cart of watermelon as follows （ Only 6 individual ）：

Above the watermelon is its number , Below is its Real label . We use the learned model $f$ The judgment results of the six watermelons are as follows ：

It can be seen that , The number is $1, 2, 5$ All the watermelons were misjudged , So the error rate is $3 / 6 = 0.5$ , The accuracy is also $0.5$ .

Pick out the melon （ That is, the good melon judged by the model ） by $2, 4, 5, 6$ , The four melons selected are only $4$ and $6$ It's really a good melon , Proportion $0.5$ .
All really good melons are $1, 4, 6$ , These three are really good melons , Only $4$ and $6$ Was singled out （ That is, the model is judged as good ）, Proportion $0.67$ .

Next, we can define the precision and recall , But before that , We need to introduce Confusion matrix （Confusion Matrix）.

For dichotomies , The sample can be divided into four categories according to the combination of its real category and the category predicted by the model ：

$T P$ （True Positive）： The real mark is just , The prediction flag is also just .
$F P$ （False Positive）： The real mark is negative , But the forecast is marked just .
$T N$ （True Negative）： The real mark is negative , The prediction flag is also negative .
$F N$ （False Negative）： The real mark is just , But the forecast is marked negative .

Obviously there is $T P + F P + T N + F N = m$ . The confusion matrix form of the classification results is as follows ：

$\begin{bmatrix} TN & FP \\ FN& TP \\ \end{bmatrix}$

our Precision rate （Precision） And Recall rate （Recall） They are defined as ：

$P=\frac{TP}{TP+FP},\quad R=\frac{TP}{TP+FN}$

for example , For the example we gave before , The accuracy and accuracy are respectively

$P=0.5,\quad R=0.67$

Now calculate the confusion matrix ：

Really good melon , What is judged as good by the model is $4$ and $6$ , therefore $T P = 2$ ;
Really bad melon , What is judged as good by the model is $2$ and $5$ , therefore $F P = 2$ ;
Really good melon , What is judged as bad by the model is $1$ , therefore $F N = 1$ ;
For the last one , We can apply the formula directly , namely $T N = 6 - T P - F P - F N = 1$ ;

So the confusion matrix is

$\begin{bmatrix} 1 & 2 \\ 1& 2 \\ \end{bmatrix}$

It's not hard to see. , Precision and recall are applicable to classification tasks , The corresponding implementation is as follows ：

"""  Precision rate  """
from sklearn.metrics import precision_score
precision = precision_score(y_true, y_pred)

"""  Recall rate  """
from sklearn.metrics import recall_score
recall = recall_score(y_true, y_pred)

For the example at the beginning of this section , Let's remember that melon is $1$ , The bad melon is $0$ , be ：

from sklearn.metrics import precision_score, recall_score, accuracy_score

y_true = [1, 0, 0, 1, 0, 1]
y_pred = [0, 1, 0, 1, 1, 1]
print(accuracy_score(y_true, y_pred))
# 0.5
print(precision_score(y_true, y_pred))
# 0.5
print(recall_score(y_true, y_pred))
# 0.6666666666666666

The result is consistent with our original calculation .

2.2 Visualization of confusion matrix

For dichotomies , Our confusion matrix is a $2\times 2$ matrix . Then we can know , about $N$ Classification problem , Our confusion matrix is a $N\times N$ matrix .

sklearn.metrics The calculation of the confusion matrix is provided ：confusion_matrix(). We still use 2.1 The example in Section , Use confusion_matrix() To calculate the corresponding confusion matrix ：

from sklearn.metrics import confusion_matrix

y_true = [1, 0, 0, 1, 0, 1]
y_pred = [0, 1, 0, 1, 1, 1]
C = confusion_matrix(y_true, y_pred)
print(C)
# [[1 2]
# [1 2]]

The output results are consistent with our 2.1 Same as calculated in section .

For multi classification problems , confusion_matrix() The confusion matrix returned $C$ Satisfy ： $C_{ij}$ The representative real category is $i$ But it is predicted by the model as a category $j$ Number of samples .

In order to better show the confusion matrix , Let's consider three categories , Corresponding y_true and y_pred Set to :

y_true = [2, 0, 2, 2, 0, 1]
y_pred = [0, 0, 2, 2, 0, 2]

Now use ConfusionMatrixDisplay() To realize the visualization of confusion matrix

from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt

y_true = [2, 0, 2, 2, 0, 1]
y_pred = [0, 0, 2, 2, 0, 2]
C = confusion_matrix(y_true, y_pred)
disp = ConfusionMatrixDisplay(C)
disp.plot()
plt.show()

2.3 P-R Curve and BEP

Precision and recall are a pair of contradictory measures . Generally speaking , When the accuracy is high , The recall rate is often low ; When recall is high , Precision is often low . Usually only in some simple tasks , Can make the precision and recall rate are very high .

go back to 2.1 The example in Section , The model we trained based on watermelon data set is essentially a two classifier . in fact , The principle of many binary classifiers , Is to set a threshold , Then grade each sample , Examples with scores greater than or equal to this threshold are classified as positive classes , Examples with scores less than this threshold are classified as negative .

for example , Set the threshold to $0.5$ , For a new sample （ watermelon ）, If it scores higher than $0.5$ , Is considered a good melon , Otherwise, it is considered a bad melon .

in fact , The accuracy mentioned above 、 Precision rate 、 Recall rates all depend on specific thresholds . Sometimes , We hope that the threshold is not fixed , But according to the actual needs to adjust .

Still used 2.1 The example in Section , Suppose the threshold is $0.5$ , Our two classifiers score six samples as follows ：

We classify the six watermelons according to their scores From high to low Sort ：

Now? , We Traverse from top to bottom . For the example of the first line , Set its score $0.88$ Threshold value , Greater than or equal to The prediction of this threshold is a positive example , Less than The prediction of this threshold is a counterexample , The corresponding results are as follows ：

The calculated precision and recall are $P=1,\, R=0.33$ .

For the example of the second line , Set its score $0.76$ Threshold value , Greater than or equal to The prediction of this threshold is a positive example , Less than The prediction of this threshold is a counterexample , The corresponding results are as follows ：

The calculated precision and recall are $P=1,\, R=0.67$ .

And so on , We can finally get $6$ individual $(R, P)$ value . The code implementation is as follows ：

from sklearn.metrics import precision_score, recall_score

y_true = [1, 1, 0, 0, 1, 0]

for i in range(len(y_true)):
    y_pred = [1] * (i + 1) + [0] * (len(y_true) - i - 1)
    P = precision_score(y_true, y_pred)
    R = recall_score(y_true, y_pred)
    print((R, P))

Output results ：

(0.3333333333333333, 1.0)
(0.6666666666666666, 1.0)
(0.6666666666666666, 0.6666666666666666)
(0.6666666666666666, 0.5)
(1.0, 0.6)
(1.0, 0.5)

Let's connect these six points to draw a curve ：

from sklearn.metrics import precision_score, recall_score
import matplotlib.pyplot as plt

y_true = [1, 1, 0, 0, 1, 0]
R, P = [], []

for i in range(len(y_true)):
    y_pred = [1] * (i + 1) + [0] * (len(y_true) - i - 1)
    P += [precision_score(y_true, y_pred)]
    R += [recall_score(y_true, y_pred)]
    
plt.plot(R, P)
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.show()

The picture above is called P-R chart , The curve is called P-R curve .

P-R Further discussion of curves ： First, let's remember that the number of samples marked as positive and negative is $m^+$ and $m^-$ , namely

$m^+=TP+FN,\quad m^-=TN+FP$

Precision and recall can be written as

$P=\frac{TP}{TP+FP},\quad R=\frac{TP}{m^+}$

Now consider the more general case , We will $m$ The score of a watermelon （ Before that, the score of six watermelons ） From high to low Arrange to get a Ordered list ：

$\mathrm{score}= \begin{bmatrix} h_1 \\ h_2 \\ \vdots \\ h_m \end{bmatrix}$

Set the threshold to $h$ , When $h>h_1$ when , All watermelons are predicted to be bad , That is, no watermelon is predicted to be a good melon , therefore $T P = F P = 0$ , here $P = 0 / 0$ meaningless , So our next discussion will be based on $h\leq h_1$ . generally speaking , We'll put the threshold $h$ Set the score of each sample separately , So there will be $m$ A threshold .

We Take the minimum threshold first , namely $h=h_m$ , Then all melons will be predicted to be good melons , That is, no melon is predicted to be bad , therefore $T N = F N = 0$ , At this time there is $TP=m^+$ and $FP=m^-$ , thus $R = 1$ And $P=m^+/(m^++m^-)=m^+/m$ , This is reflected in P-R On the curve The last point The coordinates of the for

$\Big(1,\frac{m^+}{m}\Big)$

If $(1,m^+/m)\to(1,0)$ , Then there are $m^+\ll m$ , therefore $m^-\gg0$ , Combined with the above $T N = 0$ , This explanation There are a lot of Counterexamples in the sample , And they were all mispredicted . Again because $F N = 0$ , explain There are a few positive examples in the sample , And they were predicted correctly . Thus it , If P-R The last point of the curve tends to $(1, 0)$ , So the sample distribution Extremely uneven （ There are many counter examples and few positive examples ）, And the classifier is for Counter examples are all mispredicted , about All the predictions of the positive examples are correct , So this kind of P-R The classifier corresponding to the curve It's terrible .

We Then take the maximum threshold , namely $h=h_1$ , Then only the first watermelon will be predicted as a good melon , The remaining watermelons are predicted to be bad . Let's discuss it in the following two cases ：

The first watermelon itself is a good melon , that $T P = 1$ , $F P = 0$ , thus $P = 1$ , $R=1/m^+$ ,P-R On the curve The first point The coordinates of the for $1/m^+,1)$ ;
The first watermelon itself is a bad melon , that $F P = 1$ , $T P = 0$ , thus $P = R = 0$ ,P-R On the curve The first point The coordinates of the for $(0, 0)$ .

In most cases, our data sets are relatively large , namely $m^+\gg0$ . therefore , When the example with the highest score is a positive example ,P-R The coordinates of the first point on the curve Very close to $(0, 1)$ But not equal to $(0, 1)$ ; When the example with the highest score is a counterexample ,P-R The coordinates of the first point on the curve yes $(0, 0)$ .

More intuitively , Let's say that each one $h_i$ All correspond to one melon , When we will $h$ from $h_1$ Down to $h_m$ when , Corresponding P-R The curve will be drawn from the first point to the last point in turn . When $h$ from $h_{i-1}$ Down to $h_i$ when , if $h_i$ The corresponding melon itself is a positive example , be $\uparrow$ , $F P$ unchanged , $\downarrow$ , thus $P\uparrow$ , $R\uparrow$ , This is reflected in P-R The curve will produce a Right up The line segment . if $h_i$ The melon itself is a counterexample , be $FP\uparrow$ , $T P$ unchanged , $F N$ Also the same , thus $P\downarrow$ , $R$ unchanged , This is reflected in P-R The curve will produce a Vertical down The line segment .

Based on the above discussion, it can be concluded that ： We from $(0, 0)$ or $1/m^+,1)$ Start , Lower the threshold according to the sequence table . Every time after a positive example , Let's draw a line segment diagonally upward to the right ; Every time after a counterexample , Let's draw a line segment vertically down . Go on like this until you reach $1, m^+/m)$ , here P-R The curve is drawn .

From the drawing process, we can see , our P-R The curve is Serrate Of , And present “ falling ” trend .

Of course ,sklearn.metrics Drawing is provided in P-R A function of a curve , We will list the real tags y_true and Score list y_score Pass in precision_recall_curve() The accuracy can be obtained in 、 Recall and threshold , as follows ：

from sklearn.metrics import precision_recall_curve

y_true = [1, 0, 0, 1, 0, 1]
y_score = [0.45, 0.53, 0.24, 0.88, 0.57, 0.76]
precision, recall, thresholds = precision_recall_curve(y_true, y_score)
print(precision)
# [0.6 0.5 0.66666667 1. 1. 1. ]
print(recall)
# [1. 0.66666667 0.66666667 0.66666667 0.33333333 0. ]
print(thresholds)
# [0.45 0.53 0.57 0.76 0.88]

And then use PrecisionRecallDisplay() To draw ：

from sklearn.metrics import precision_recall_curve, PrecisionRecallDisplay
import matplotlib.pyplot as plt

y_true = [1, 0, 0, 1, 0, 1]
y_score = [0.45, 0.53, 0.24, 0.88, 0.57, 0.76]
precision, recall, _ = precision_recall_curve(y_true, y_score)
disp = PrecisionRecallDisplay(precision, recall)
disp.plot()
plt.show()

Readers may wonder , Why is the curve here the same as the curve we drew before Dissimilarity , And why thresholds in There are only five thresholds Well ？

Let's not use PrecisionRecallDisplay(), Only precision_recall_curve() Get the results to draw ：

from sklearn.metrics import precision_recall_curve
import matplotlib.pyplot as plt

y_true = [1, 0, 0, 1, 0, 1]
y_score = [0.45, 0.53, 0.24, 0.88, 0.57, 0.76]
precision, recall, _ = precision_recall_curve(y_true, y_score)
plt.plot(recall, precision)
plt.show()

It can be seen that this picture is compared with the previous one , It's just that the way you connect has changed .PrecisionRecallDisplay() The oblique upward connection mode is cancelled in , For the sake of beauty “ Horizontal and vertical ” The way to draw , Let's take another look PrecisionRecallDisplay() in plot() Partial source code of function ：

def plot(self, ax=None, *, name=None, **kwargs):

	...
	line_kwargs = {
    "drawstyle": "steps-post"}
	...
	(self.line_,) = ax.plot(self.recall, self.precision, **line_kwargs)
	...
	return self

"step-post" This parameter indicates P-R The curve will be Ladder form Drawing , For details, see file .

The three above P-R In the figure , We already know that the second picture and the third picture are just the difference in the way they are drawn , Next, let's compare the third picture with the first picture .

You can see , Compared to the first picture , The third picture Removed the last point , also Add... Before the first point $(0, 1)$ This point , What is the purpose of this practice ？

So let's see precision_recall_curve() Source code ：

def precision_recall_curve(y_true, probas_pred, pos_label=None, sample_weight=None):
                           
    fps, tps, thresholds = _binary_clf_curve(y_true, probas_pred,
                                             pos_label=pos_label,
                                             sample_weight=sample_weight)

    precision = tps / (tps + fps)
    precision[np.isnan(precision)] = 0
    recall = tps / tps[-1]

    # stop when full recall attained
    # and reverse the outputs so recall is decreasing
    last_ind = tps.searchsorted(tps[-1])
    sl = slice(last_ind, None, -1)
    return np.r_[precision[sl], 1], np.r_[recall[sl], 0], thresholds[sl]

from return One line shows $(0, 1)$ This point is Force Plus , That was P-R Why is the last point on the curve removed ？

Notice this line of comment ：

# stop when full recall attained

When $R = 1$ Stop counting , And the abscissa of the last two points of our first graph All for $1$ , So the last point will not be calculated , The corresponding minimum threshold is not added to thresholds in .

In fact, it can be proved that , If the example with the lowest score is Counter example , Then the abscissa of the last two points All for $1$ ; If the example with the lowest score is Example , be The second last point The abscissa of is $1-1/m^+$ .

As for why $(0, 1)$ Will be forcibly added to P-R In the curve , Because sklearn Want to make P-R Curve from $y$ The axis begins to draw .

For the convenience of the following description , We will P-R The curve is drawn into a monotonous and smooth curve （ Be careful , In real tasks P-R The curve is usually Nonmonotonic , Not smooth , It fluctuates up and down in many parts , Please refer to the picture above ）, Here's the picture ：

P-R The figure intuitively shows the recall and precision of the classifier in the sample population . When comparing , If a classifier P-R The curve is completely transformed by the curve of another classifier encase , It can be asserted that the performance of the latter is better than the former . for example , Above picture $B$ Better than $C$ .

If two classifiers P-R The curves cross , For example, in the figure above $A$ and $B$ , At this time, a more reasonable criterion is comparison P-R The size of the area under the curve , To some extent, it represents the relative accuracy and recall of the classifier “ Double high ” The proportion of , But this value is not easy to estimate , Therefore, we need to design some performance measures that can comprehensively investigate the precision and recall .

Balance point （Break-Even Point, abbreviation BEP） Is such a measure , It is $P = R$ The value of time . For the example mentioned at the beginning of this section , Its equilibrium point is $0.67$ .

2.4 $F 1$ And $F_{\beta}$

As mentioned above BEP Oversimplified some , What we use more often is $F 1$ Measure , It is based on precision and recall Harmonic mean Defined ：

$\frac{1}{F1}=\frac12\left(\frac1P+\frac1R\right)$

Simplify to get

$F1=\frac{2\cdot P\cdot R}{P+R}$

In some applications , We pay different attention to precision and recall , So we need to introduce $F 1$ The general form of measurement —— $F_{\beta}$ , It allows us to express the accuracy $/$ Different preferences for recall . It is defined as the of precision and recall Weighted harmonic average ：

$\frac{1}{F_{\beta}}=\frac{1}{1+\beta^2}\left(\frac1P+\frac{\beta^2}{R}\right),\quad \beta>0$

Simplify to get

$F_{\beta}=\frac{(1+\beta^2)\cdot P\cdot R}{\beta^2\cdot P+R},\quad \beta>0$

$\beta=1$ , $F_{\beta}$ Degenerate to $F 1$ ;
$\beta>1$ , Recall has a greater impact ;
$\beta<1$ , Precision has a greater impact .

$F 1$ And $F_{\beta}$ Is applicable to Classification task Performance metrics for , The corresponding implementation is as follows ：

""" F1 """
from sklearn.metrics import f1_score
f1 = f1_score(y_true, y_pred)

""" Fbeta """
from sklearn.metrics import fbeta_score
fbeta = fbeta_score(y_true, y_pred, beta=0.5) #  With beta=0.5 For example

3、 ... and 、ROC And AUC

3.1 ROC（Receiver Operating Characteristic）

We have already mentioned , Yes $m$ Ranking the scores of samples from high to low can get a sequential table . In different application tasks , We can set different settings according to the task requirements threshold （ Cut off point ）. If we pay more attention to accuracy , Can be truncated at the top of the list ; If we pay more attention to recall , Can be truncated at the back of the list .

therefore , The quality of sorting itself , It reflects the comprehensive consideration of the learning machine under different tasks Expected generalization performance The stand or fall of ,ROC Curve is a powerful tool to study the generalization performance of learners from this point of view .

ROC The full name is Work characteristics of subjects （Receiver Operating Characteristic）, It originated from the radar signal analysis technology used for enemy aircraft detection in World War II , Since then, it has been introduced into the field of machine learning .

ROC Curve and P-R The curves are very similar . stay P-R In the curve , The ordinate adopts the precision , The abscissa uses recall , But in ROC In the curve , The ordinate is True case rate （True Positive Rate, abbreviation TPR）, The abscissa is The false positive rate is （False Positive Rate, abbreviation FPR）, The two are defined as

$TPR=\frac{TP}{TP+FN}=R,\quad FPR=\frac{FP}{TN+FP}$

We will $(F P R, T P R)$ The points are connected by line segments to get ROC curve .

sklearn.metrics Implementation is provided in ROC A function of a curve roc_curve(), The corresponding usage is as follows ：

from sklearn.metrics import roc_curve
import matplotlib.pyplot as plt

y_true = [1, 0, 0, 1, 0, 1]
y_score = [0.45, 0.53, 0.24, 0.88, 0.57, 0.76]
fpr, tpr, _ = roc_curve(y_true, y_score)
plt.plot(fpr, tpr)
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.show()

Of course, we can also use RocCurveDisplay() To quickly draw ：

from sklearn.metrics import roc_curve, RocCurveDisplay
import matplotlib.pyplot as plt

y_true = [1, 0, 0, 1, 0, 1]
y_score = [0.45, 0.53, 0.24, 0.88, 0.57, 0.76]
fpr, tpr, _ = roc_curve(y_true, y_score)
disp = RocCurveDisplay(fpr=fpr, tpr=tpr)
disp.plot()
plt.show()

The output result is consistent with the figure above .

As you can see from the diagram ,ROC It also shows a curve Serrate , And each section is horizontal and vertical . Besides ,ROC The curve is “ rising ” trend , Its first and last points must be located at $(0, 0)$ and $(1, 1)$ . The better the performance of the learner ,ROC The closer the curve is to the upper left corner of the graph .

Set the point corresponding to the current threshold as $(x, y)$ , We lower the threshold in turn . When passing a positive example , The coordinates of the next point are $x,y+1/m^+)$ ; After a counterexample , The coordinates of the next point are $x+1/m^-,y)$ .

3.2 AUC（Area Under roc Curve）

When comparing learners , And P-R The picture is similar , If a learner ROC The curve is completely changed by the curve of another learner encase , It can be asserted that the performance of the latter is better than the former . If two learners ROC The curves cross , Then we have to compare ROC The area under the curve , namely AUC（Area Under roc Curve）, As shown in the figure below

Assume ROC The curve is made up of coordinates $(x_1,y_1),(x_2,y_2),\cdots,(x_m,y_m)$ The sequential connection of forms , among $x_1,y_1)=(0,0),\,(x_m,y_m)=(1,1)$ , be AUC by

$\mathrm{AUC}=\sum_{i=1}^{m-1}(x_{i+1}-x_i)\cdot \frac{y_{i+1}+y_i}{2}$

sklearn.metrics Medium auc() It is calculated according to the above formula , The corresponding codes are as follows ：

from sklearn.metrics import roc_curve, auc

y_true = [1, 0, 0, 1, 0, 1]
y_score = [0.45, 0.53, 0.24, 0.88, 0.57, 0.76]
fpr, tpr, _ = roc_curve(y_true, y_score)
print(auc(fpr, tpr))
# 0.7777777777777778

But the above method needs to calculate the abscissa and ordinate first fpr、tpr, A faster way is to use roc_auc_score():

from sklearn.metrics import roc_auc_score

y_true = [1, 0, 0, 1, 0, 1]
y_score = [0.45, 0.53, 0.24, 0.88, 0.57, 0.76]
print(roc_auc_score(y_true, y_score))
# 0.7777777777777778

If we want to ROC The graph shows AUC, You need to AUC Pass in RocCurveDisplay() Medium roc_auc in ：

from sklearn.metrics import roc_curve, RocCurveDisplay, auc
import matplotlib.pyplot as plt

y_true = [1, 0, 0, 1, 0, 1]
y_score = [0.45, 0.53, 0.24, 0.88, 0.57, 0.76]
fpr, tpr, _ = roc_curve(y_true, y_score)
roc_auc = auc(fpr, tpr)
disp = RocCurveDisplay(fpr=fpr, tpr=tpr, roc_auc=roc_auc)
disp.plot()
plt.show()

AUC Further discussion of ： Ignore situations where different examples have the same score , Make $D^+$ and $D^-$ They are positive 、 Set of counterexamples , And $D^+|=m^+,\, |D^-|=m^-$ , be AUC It can be expressed as ：

$\mathrm{AUC}=\frac{1}{m^+m^-}\sum_{\boldsymbol{x}^+\in D^+}\sum_{\boldsymbol{x}^-\in D^-}\mathbb{I}[f(\boldsymbol{x}^+)>f(\boldsymbol{x}^-)]$

As can be seen from the above expression ,AUC In fact, it reflects that the score of a positive case in the sample is greater than that of a negative case probability , That is, the sample predicted Sort quality .

Definition Sort loss by

$\ell_{rank}=\frac{1}{m^+m^-}\sum_{\boldsymbol{x}^+\in D^+}\sum_{\boldsymbol{x}^-\in D^-}\mathbb{I}[f(\boldsymbol{x}^+)<f(\boldsymbol{x}^-)]$