当前位置：网站首页>Complete set of machine learning classification task effect evaluation indicators (including ROC and AUC)

Complete set of machine learning classification task effect evaluation indicators (including ROC and AUC)

2022-07-27 18:36:00 【zkkkkkkkkkkkkk】

Catalog

1.1、 What is confusion matrix ？

1.2、 What does the confusion matrix look like ？

1.3、 Common binary confusion matrix

2.1、 Index of confusion matrix

2.2、 Secondary index of confusion matrix

2.2.1、 Accuracy rate ：

2.2.2、 Accuracy ：

2.2.3、 Recall rate （ sensitivity ）：

2.2.4、 Specificity ：

2.3、 Three level indicators of confusion matrix

2.3.1、F1-Score

3.1、ROC Curves and AUC area

4.1、Python Realization ROC curve

5.1、 other

1.1、 What is confusion matrix ？

Confusion matrix is also called error matrix （Confusion Matrix）, It is used to calculate the index of classification problems . For example, classification indicators ： Accuracy rate （ Accuracy rate ）, Accuracy , Recall rates and so on . We can all calculate through the confusion matrix . As follows

1.2、 What does the confusion matrix look like ？

Confusion matrix is used to summarize the results of a classifier . about k Metaclassification , In fact, it is a k x k Of k Dimension table , Used to record the prediction results of the classifier .

1.3、 Common binary confusion matrix

	True for 1	True for 0
Forecast as 1	TP	FP
Forecast as 0	FN	TN

among ：

The real sample is 1, The predicted sample result is 1. It is called true positive .（True Postive） abbreviation TP.

The real sample is 1, The predicted sample result is 0. Called false negative .（False Negative） abbreviation FN.

The real sample is 0, The predicted sample result is 1. Called false positive .（False Postive） abbreviation FP.

The real sample is 0, The predicted sample result is 0. Called true negative .（True Negative） abbreviation TN.

notes ：FN The situation is actually the second kind of statistical error （Type II Error）, We can understand it as letting the bad guys go ,

FP The situation is actually the first kind of statistical error （Type I Error）, We can understand it as killing good people by mistake

2.1、 Index of confusion matrix

As the output result of two categories , We definitely hope our classifier is as accurate as possible . Then the corresponding confusion matrix is TP and TN The more the better , and FP and FN The less, the better. . After knowing this decision-making method , We often observe our TP and TN In the grid The amount of data .

And because the confusion matrix can only be observed TP and TN The number of , Confusion matrix, whether it is TP or TN or FP Or FN, Only the number of samples is counted . It does not completely represent the quality of the classifier . Sometimes in specific different scenes , Our focus is also different . So there are classified secondary indicators .

2.2、 Secondary index of confusion matrix

	True for 1	True for 0
Forecast as 1	TP	FP
Forecast as 0	FN	TN

2.2.1、 Accuracy rate ：

The proportion of all samples with correct prediction in the total sample .

2.2.2、 Accuracy ：

All forecasts are 1 In a sample of , Actually, it's also 1 Proportion of samples .

2.2.3、 Recall rate （ sensitivity ）：

All truths are 1 In a sample of , The prediction is correct （ by 1） Proportion of samples .

2.2.4、 Specificity ：

All forecasts are 1 In a sample of , True for 1 Proportion of samples

2.3、 Three level indicators of confusion matrix

2.3.1、F1-Score

According to the secondary index , It extends a three-level indicator . namely F1-score. It combines accuracy （Precision） And recall rate （Recall）. The formula is as follows ：

notes ：F1-score It's a 0-1 Decimal between , The closer the 1 Indicates that the classification result is better .

3.1、ROC Curves and AUC area

The following figure shows one I trained and spent using logistic regression ROC diagram .

It's not hard for us to see ROC The curve is based on each sample point TPR Values and FPR value , A picture drawn . The horizontal axis is FPR, The vertical axis is TPR. Below the curve is AUC area ,AUC The value of is usually in 0.5~1 Between ,AUC The larger the area, the better ,ROC The closer the curve is to the upper left, the better .

4.1、Python Realization ROC curve

Python Of sklearn Already encapsulated ROC Interface of curve , We can directly call the incoming parameters to output .

from matplotlib import pyplot as plt

plot_roc_curve(lr, test_x, test_y)  # test_x： Test sample set ;test_y： Test tag set 
plt.title("ROC curve ")
plt.show()