当前位置:网站首页>Confusion matrix
Confusion matrix
2022-07-05 08:59:00 【Wanderer001】
Reference resources Confusion matrix (Confusion Matrix) - cloud + Community - Tencent cloud
brief introduction
The confusion matrix is ROC The basis of curve drawing , At the same time, it is also the most basic method to measure the accuracy of classification model , Most intuitive , The simplest way to calculate .
Explain the version in one sentence :
The confusion matrix is the error classification of the statistical classification model , The number of observations classified into pairs , Then show the results in a table . This table is the confusion matrix .
Data analysis and mining system location
Confusion matrix is the index to judge the results of the model , Part of the model evaluation . Besides , Confusion matrix is often used to judge classifiers (Classifier) The advantages and disadvantages of , It is applicable to the data model of different types , Such as classification tree (Classification Tree)、 Logical regression (Logistic Regression)、 Linear discriminant analysis (Linear Discriminant Analysis) Other methods .
In the evaluation index of classification model , There are three common methods :
- Confusion matrix ( Also known as error matrix ,Confusion Matrix)
- ROC curve
- AUC area
This article mainly introduces the first method , Confusion matrix , Also known as error matrix .
The position of this method in the whole data analysis and mining system is shown in the figure below .
Definition of confusion matrix
Definition of confusion matrix
Confusion matrix (Confusion Matrix), Its essence is far less popular than its name sounds . matrix , It can be understood as a table , The confusion matrix is actually just a table .
Take the simplest binary classification in the classification model as an example , For this kind of problem , Our model ultimately needs to judge that the result of the sample is 0 still 1, Or rather, positive still negative.
We collect samples , Can directly know the real situation , What data results are positive, What are the results negative. meanwhile , We use the sample data to run out the results of the classification model , You can also know what the model thinks these data are positive, Which are negative.
therefore , We can get these four basic indicators , I call them primary indicators ( At the bottom of the ):
The real value is positive, The model considers that positive The number of (True Positive=TP)
The real value is positive, The model considers that negative The number of (False Negative=FN): This is the first type of statistical error (Type I Error)
The real value is negative, The model considers that positive The number of (False Positive=FP): This is the second type of statistical error (Type II Error)
The real value is negative, The model considers that negative The number of (True Negative=TN)
Present these four indicators together in the table , We can get such a matrix , We call it the confusion matrix (Confusion Matrix):
Index of confusion matrix
Predictive classification model , I must hope that the more accurate the better . that , Corresponding to the confusion matrix , That must be hope TP And TN A large number of , and FP And FN The number is small . So when we get the confusion matrix of the model , You need to see how many observations are in the second 、 The position corresponding to the four quadrants , The more values here, the better ; conversely , In the first place 、 The less the observed values in the corresponding positions of the three and four quadrants, the better .
Two level index
however , The confusion matrix counts numbers , Sometimes faced with a lot of data , Just count the numbers , It's hard to measure the quality of the model . Therefore, the confusion matrix extends the basic statistical results as follows 4 Indicators , I call them secondary indicators ( Obtained by adding, subtracting, multiplying and dividing the lowest index ):
Accuracy rate (Accuracy)—— For the whole model
Accuracy (Precision)
sensitivity (Sensitivity): It's the recall rate (Recall)
Specificity (Specificity)
I use tables to define these four indicators 、 Calculation 、 The understanding is summarized :
Through the above four secondary indicators , The result of the number in the confusion matrix can be transformed into 0-1 The ratio between . It's easy to make standardized measurements .
Expand on the basis of these four indicators , Another three-level indicator of the production order
Third level index
This indicator is called F1 Score. His formula is :
among ,P representative Precision,R representative Recall.
F1-Score The indicators synthesize Precision And Recall The result of the output .F1-Score The value range of is from 0 To 1 Of ,1 The output representing the model is the best ,0 The output of representative model is the worst .
Examples of confusion matrices
When the classification problem is dichotomous, the problem is , The confusion matrix can be calculated by the above method . When the results of classification are more than two , The confusion matrix also applies .
Take the following confusion matrix as an example , The purpose of our model is to predict what animals the sample is , This is our result :
Through the confusion matrix , We can draw the following conclusion :
Accuracy
In total 66 Of the animals , We're right in all 10 + 15 + 20=45 Samples , So the accuracy (Accuracy)=45/66 = 68.2%.
Take the cat for example , We can combine the above diagram into a bipartite problem :
Precision
therefore , Take the cat for example , The results of the model tell us ,66 There are... In the animals 13 It's just cats , But actually it's 13 The cat has only 10 Only the prediction is right . The model thinks it's a cat's 13 In an animal , Yes 1 A dog , Two pigs . therefore ,Precision( cat )= 10/13 = 76.9%
Recall
Take the cat for example , In total 18 A real cat , Our model thinks that there are only 10 It's just cats , The rest 3 Just a dog ,5 All pigs . this 5 Eighty percent of them are orange cats , Can understand . therefore ,Recall( cat )= 10/18 = 55.6%
Specificity
Take the cat for example , In total 48 Among animals that are not cats , According to the model, there are 45 It's not a cat . therefore ,Specificity( cat )= 45/48 = 93.8%.
Although in 45 In an animal , The model still thinks it's wrong 6 A dog and 4 Cats , But from a cat's point of view , There is nothing wrong with the judgment of the model .
( Here is the reference Wikipedia,Confusion Matrix The explanation of ,https://en.wikipedia.org/wiki/Confusion_matrix)
F1-Score
Through the formula , You can calculate that , For cats ,F1-Score=(2 * 0.769 * 0.556)/( 0.769 + 0.556) = 64.54%
Again , We can also calculate the secondary and tertiary index values of pigs and dogs respectively .
边栏推荐
- scipy.misc.imread()
- c#比较两张图像的差异
- Ros-11 common visualization tools
- [code practice] [stereo matching series] Classic ad census: (6) multi step parallax optimization
- IT冷知识(更新ing~)
- Illustrated network: what is gateway load balancing protocol GLBP?
- Multiple linear regression (gradient descent method)
- Pearson correlation coefficient
- 我从技术到产品经理的几点体会
- How many checks does kubedm series-01-preflight have
猜你喜欢
编辑器-vi、vim的使用
Halcon blob analysis (ball.hdev)
ROS learning 4 custom message
Attention is all you need
Redis implements a high-performance full-text search engine -- redisearch
Introduction Guide to stereo vision (7): stereo matching
资源变现小程序添加折扣充值和折扣影票插件
Introduction Guide to stereo vision (1): coordinate system and camera parameters
nodejs_ fs. writeFile
Programming implementation of ROS learning 5-client node
随机推荐
Huber Loss
It cold knowledge (updating ing~)
MPSoC QSPI Flash 升级办法
Ecmascript6 introduction and environment construction
Multiple linear regression (sklearn method)
Halcon clolor_ pieces. Hedv: classifier_ Color recognition
ORACLE进阶(三)数据字典详解
Kubedm series-00-overview
Mengxin summary of LIS (longest ascending subsequence) topics
C#图像差异对比:图像相减(指针法、高速)
Golang foundation - the time data inserted by golang into MySQL is inconsistent with the local time
Array,Date,String 对象方法
520 diamond Championship 7-4 7-7 solution
Basic number theory - fast power
Redis实现高性能的全文搜索引擎---RediSearch
Halcon snap, get the area and position of coins
[daily training -- Tencent selected 50] 557 Reverse word III in string
Nodemon installation and use
Dynamic dimensions required for input: input, but no shapes were provided. Automatically overriding
Blogger article navigation (classified, real-time update, permanent top)