当前位置:网站首页>How to select the evaluation index of classification model
How to select the evaluation index of classification model
2020-11-06 01:14:00 【Artificial intelligence meets pioneer】
author |MUSKAN097 compile |VK source |Analytics Vidhya
brief introduction
You've successfully built the classification model . What should you do now ? How do you evaluate the performance of the model , That is, the performance of the model in predicting the results . To answer these questions , Let's use a simple case study to understand the metrics used in evaluating classification models .
Let's take a deeper look at concepts through case studies
In this era of Globalization , People often travel from one place to another . Because the passengers are waiting in line 、 Check in 、 Visit food suppliers and use toilets and other facilities , Airports can bring risks . Tracking passengers carrying the virus at the airport helps prevent the spread of the virus .
Think about it , We have a machine learning model , Divide the passengers into COVID Positive and negative . When making classification prediction , There are four possible types of results :
Real examples (TP): When you predict that an observation belongs to a class , And it actually belongs to that category . under these circumstances , In other words, it is predicted that COVID Positive and actually positive passengers .
True counter example (TN): When you predict that an observation does not belong to a class , It doesn't really belong to that category either . under these circumstances , In other words, the prediction is not COVID positive ( negative ) And it's not really COVID positive ( negative ) Passengers .
False positive example (FalsePositive,FP): When you predict that an observation belongs to a certain class , When it doesn't belong to this class . under these circumstances , In other words, it is predicted that COVID Positive, but it's not COVID positive ( negative ) Passengers .
False counter example (FN): When you predict that an observation does not belong to a class , And it actually belongs to that category . under these circumstances , In other words, the prediction is not COVID positive ( negative ) And it's actually COVID Positive passengers .
Confusion matrix
To better visualize the performance of the model , These four results are plotted on the confusion matrix .
Accuracy
Yes ! You're right , We want our model to focus on the real positive and negative examples . Accuracy is an indicator , It gives the score our model correctly predicts . Formally , Accuracy has the following definition :
Accuracy = Correct prediction number / The total number of predictions .
Now? , Let's consider that on average there are 50000 Passengers travel . Among them is 10 Yes COVID positive .
An easy way to improve accuracy is to classify each passenger as COVID negative . So our confusion matrix is as follows :
The accuracy of this case is :
Accuracy =49990/50000=0.9998 or 99.98%
magical !! That's right ? that , This really solves the problem of classifying correctly COVID The purpose of the positive passengers ?
For this particular example , We tried to mark the passengers as COVID Positive and negative , Hope to be able to identify the right passengers , I can simply mark everyone as COVID Negative to get 99.98% The accuracy of .
obviously , It's a more accurate method than we've seen in any model . But that doesn't solve the purpose . The purpose here is to identify COVID Positive passengers . under these circumstances , Accuracy is a terrible measure , Because it's easy to get very good accuracy , But that's not what we're interested in .
So in this case , Accuracy is not a good way to evaluate models . Let's take a look at a very popular measure , It's called the recall rate .
Recall rate ( Sensitivity or true case rate )
The recall rate gives you a score that you correctly identified as positive .
Now? , This is an important measure . Of all the positive passengers , What is the score you correctly identified . Back to our old strategy , Mark every passenger negative , So the recall rate is zero .
Recall = 0/10 = 0
therefore , under these circumstances , Recall rate is a good measure . It said , Identify every passenger as COVID The terrible strategy of negativity leads to zero recall . We want to maximize the recall rate .
As another positive answer to each of the above questions , Please consider it COVID Every question of . Everyone goes into the airport , They're labeled positive by models . It's not good to put a positive label on every passenger , Because before they board the plane , The actual cost of investigating each passenger is enormous .
The confusion matrix is as follows :
The recall rate will be :
Recall = 10/(10+0) = 1
It's a big problem . therefore , The conclusion is that , Accuracy is a bad idea , Because putting negative labels on everyone can improve accuracy , But hopefully the recall rate is a good measure in this case , But then I realized , Putting a positive label on everyone also increases the recall rate .
So independent recall rates are not a good measure .
Another method of measurement is called accuracy
accuracy
The accuracy gives the fraction of all predicted positive results that are correctly identified as positive .
Considering our second wrong strategy , I'm going to mark each passenger as positive , The accuracy will be :
Precision = 10 / (10 + 49990) = 0.0002
Although this wrong strategy has a good recall value 1, But it has a terrible accuracy value 0.0002.
This shows that recall alone is not a good measure , We need to think about accuracy .
Consider another situation ( This will be the last case , by my troth :P) Mark the top passengers as COVID positive , That is to mark the disease COVID The most likely passenger . Suppose we have only one such passenger . The confusion matrix in this case is :
The accuracy is :1/(1+0)=1
under these circumstances , The accuracy is very good , But let's check the recall rate :
Recall = 1 / (1 + 9) = 0.1
under these circumstances , The accuracy is very good , But the recall value is low .
scene | Accuracy | Recall rate | accuracy |
---|---|---|---|
Classify all passengers as negative | high | low | low |
Classify all passengers as positive | low | high | low |
The top passengers are marked with COVID positive | high | low | low |
In some cases , We are very sure that we want to maximize recall or accuracy , And at the cost of others . In this case of marking passengers , We really want to be able to correctly predict COVID Positive passengers , Because it's very expensive not to predict the accuracy of passengers , Because it allows COVID Positive people passing through can lead to an increase in transmission . So we're more interested in the recall rate .
Unfortunately , You can't have both : Improving accuracy will reduce recall , vice versa . This is called accuracy / Recall rate tradeoff .
Accuracy / Recall rate tradeoff
The probability of output of some classification models is between 0 and 1 Between . Before we divide the passengers into COVID In positive and negative cases , We want to avoid missing out on the actual positive cases . especially , If a passenger is really positive , But our model doesn't recognize it , It would be very bad , Because the virus is likely to spread by allowing these passengers to board . therefore , Even if there's a little doubt, there's COVID, We have to put a positive label on it, too .
So our strategy is , If the output probability is greater than 0.3, We mark them as COVID positive .
This leads to higher recall rates and lower accuracy .
Consider the opposite , When we determine that the passenger is positive , We want to classify passengers as positive . We set the probability threshold to 0.9, When the probability is greater than or equal to 0.9 when , Classify passengers as positive , Otherwise it's negative .
So generally speaking , For most classifiers , When you change the probability threshold , There will be a trade-off between recall and accuracy .
If you need to compare different models with different exact recall values , It is usually convenient to combine precision and recall into a single metric . Right !! We need an index that considers both recall and accuracy to calculate performance .
F1 fraction
It is defined as the harmonic mean of model accuracy and recall rate .
You must wonder why the harmonic average is not the simple average ? We use harmonic means because it is insensitive to very large values , It's not like a simple average .
For example , We have a precision of 1 Model of , The recall rate is 0 A simple average value is given 0.5,F1 The score is 0. If one of the parameters is low , The second parameter is in F1 Scores don't matter anymore .F1 Scores tend to be classifiers with similar accuracy and recall rate .
therefore , If you want to strike a balance between accuracy and recall ,F1 Score is a better measure .
ROC/AUC curve
ROC It's another common assessment tool . It gives the model in 0 To 1 Between the sensitivity and specificity of each possible decision point . For classification problems with probabilistic outputs , The output probability can be converted into a threshold . So by changing the threshold , You can change some numbers in the confusion matrix . But the most important question here is , How to find the right threshold ?
For every possible threshold ,ROC The rate of false positive cases and true cases of curve drawing .
The false positive rate is : The proportion of counter examples wrongly classified as positive .
True case rate : The proportion of positive examples correctly predicted as positive examples .
Now? , Consider a low threshold . therefore , Of all the probabilities in ascending order , lower than 0.1 Is considered negative , higher than 0.1 All are considered positive . The selection threshold is free
But if you set your threshold high , such as 0.9.
The following is the first mock exam for the same model under different thresholds ROC curve .
As can be seen from the above figure , The true case rate is increasing at a higher rate , But at some threshold ,TPR It starts to decrease . Every time you add TPR, We have to pay a price —FPR An increase in . In the initial stage ,TPR The increase is higher than FPR
therefore , We can choose TPR High and high FPR Low threshold .
Now? , Let's see TPR and FPR The different values of the model tell us what .
For different models , We're going to have different ROC curve . Now? , How to compare different models ? As can be seen from the graph above , The curve above represents that the model is good . One way to compare classifiers is to measure ROC The area under the curve .
AUC( Model 1)>AUC( Model 2)>AUC( Model 2)
So the model 1 It's the best .
summary
We learned about the different metrics used to evaluate the classification model . When to use which indicators depends largely on the nature of the problem . So now back to your model , Ask yourself what the main purpose of your solution is , Choose the right indicator , And evaluate your model .
Link to the original text :https://www.analyticsvidhya.com/blog/2020/10/how-to-choose-evaluation-metrics-for-classification-model/
Welcome to join us AI Blog station : http://panchuang.net/
sklearn Machine learning Chinese official documents : http://sklearn123.com/
Welcome to pay attention to pan Chuang blog resource summary station : http://docs.panchuang.net/
版权声明
本文为[Artificial intelligence meets pioneer]所创,转载请带上原文链接,感谢
边栏推荐
- 8.1.2 handling global exceptions through simplemappingexceptionresolver
- Big data real-time calculation of baichenghui Hangzhou station
- 接口压力测试:Siege压测安装、使用和说明
- 别走!这里有个笔记:图文讲解 AQS ,一起看看 AQS 的源码……(图文较长)
- JUC 包下工具类,它的名字叫 LockSupport !你造么?
- 自然语言处理-搜索中常用的bm25
- Azure Data Factory(三)整合 Azure Devops 實現CI/CD
- 文本去重的技术方案讨论(一)
- mac 下常用快捷键,mac启动ftp
- DeepWalk模型的简介与优缺点
猜你喜欢
随机推荐
React 高阶组件浅析
python过滤敏感词记录
面经手册 · 第15篇《码农会锁,synchronized 解毒,剖析源码深度分析!》
网络安全工程师演示:原来***是这样获取你的计算机管理员权限的!【***】
利用 AWS SageMaker BlazingText 对不均衡文本进行多分类
Skywalking系列博客2-Skywalking使用
【Flutter 實戰】pubspec.yaml 配置檔案詳解
接口压力测试:Siege压测安装、使用和说明
Gradient understanding decline
PMP考试心得
Jmeter——ForEach Controller&Loop Controller
[译] 5个Vuex插件,给你的下个VueJS项目
nlp模型-bert从入门到精通(二)
面经手册 · 第14篇《volatile 怎么实现的内存可见?没有 volatile 一定不可见吗?》
谁说Cat不能做链路跟踪的,给我站出来
车的换道检测
htmlcss
TF flags的简介
經典動態規劃:完全揹包問題
自然语言处理之命名实体识别-tanfordcorenlp-NER(一)