当前位置:网站首页>Confusion matrix
Confusion matrix
2022-07-05 08:59:00 【Wanderer001】
Reference resources Confusion matrix (Confusion Matrix) - cloud + Community - Tencent cloud
brief introduction
The confusion matrix is ROC The basis of curve drawing , At the same time, it is also the most basic method to measure the accuracy of classification model , Most intuitive , The simplest way to calculate .
Explain the version in one sentence :
The confusion matrix is the error classification of the statistical classification model , The number of observations classified into pairs , Then show the results in a table . This table is the confusion matrix .
Data analysis and mining system location
Confusion matrix is the index to judge the results of the model , Part of the model evaluation . Besides , Confusion matrix is often used to judge classifiers (Classifier) The advantages and disadvantages of , It is applicable to the data model of different types , Such as classification tree (Classification Tree)、 Logical regression (Logistic Regression)、 Linear discriminant analysis (Linear Discriminant Analysis) Other methods .
In the evaluation index of classification model , There are three common methods :
- Confusion matrix ( Also known as error matrix ,Confusion Matrix)
- ROC curve
- AUC area
This article mainly introduces the first method , Confusion matrix , Also known as error matrix .
The position of this method in the whole data analysis and mining system is shown in the figure below .
Definition of confusion matrix
Definition of confusion matrix
Confusion matrix (Confusion Matrix), Its essence is far less popular than its name sounds . matrix , It can be understood as a table , The confusion matrix is actually just a table .
Take the simplest binary classification in the classification model as an example , For this kind of problem , Our model ultimately needs to judge that the result of the sample is 0 still 1, Or rather, positive still negative.
We collect samples , Can directly know the real situation , What data results are positive, What are the results negative. meanwhile , We use the sample data to run out the results of the classification model , You can also know what the model thinks these data are positive, Which are negative.
therefore , We can get these four basic indicators , I call them primary indicators ( At the bottom of the ):
The real value is positive, The model considers that positive The number of (True Positive=TP)
The real value is positive, The model considers that negative The number of (False Negative=FN): This is the first type of statistical error (Type I Error)
The real value is negative, The model considers that positive The number of (False Positive=FP): This is the second type of statistical error (Type II Error)
The real value is negative, The model considers that negative The number of (True Negative=TN)
Present these four indicators together in the table , We can get such a matrix , We call it the confusion matrix (Confusion Matrix):
Index of confusion matrix
Predictive classification model , I must hope that the more accurate the better . that , Corresponding to the confusion matrix , That must be hope TP And TN A large number of , and FP And FN The number is small . So when we get the confusion matrix of the model , You need to see how many observations are in the second 、 The position corresponding to the four quadrants , The more values here, the better ; conversely , In the first place 、 The less the observed values in the corresponding positions of the three and four quadrants, the better .
Two level index
however , The confusion matrix counts numbers , Sometimes faced with a lot of data , Just count the numbers , It's hard to measure the quality of the model . Therefore, the confusion matrix extends the basic statistical results as follows 4 Indicators , I call them secondary indicators ( Obtained by adding, subtracting, multiplying and dividing the lowest index ):
Accuracy rate (Accuracy)—— For the whole model
Accuracy (Precision)
sensitivity (Sensitivity): It's the recall rate (Recall)
Specificity (Specificity)
I use tables to define these four indicators 、 Calculation 、 The understanding is summarized :
Through the above four secondary indicators , The result of the number in the confusion matrix can be transformed into 0-1 The ratio between . It's easy to make standardized measurements .
Expand on the basis of these four indicators , Another three-level indicator of the production order
Third level index
This indicator is called F1 Score. His formula is :
among ,P representative Precision,R representative Recall.
F1-Score The indicators synthesize Precision And Recall The result of the output .F1-Score The value range of is from 0 To 1 Of ,1 The output representing the model is the best ,0 The output of representative model is the worst .
Examples of confusion matrices
When the classification problem is dichotomous, the problem is , The confusion matrix can be calculated by the above method . When the results of classification are more than two , The confusion matrix also applies .
Take the following confusion matrix as an example , The purpose of our model is to predict what animals the sample is , This is our result :
Through the confusion matrix , We can draw the following conclusion :
Accuracy
In total 66 Of the animals , We're right in all 10 + 15 + 20=45 Samples , So the accuracy (Accuracy)=45/66 = 68.2%.
Take the cat for example , We can combine the above diagram into a bipartite problem :
Precision
therefore , Take the cat for example , The results of the model tell us ,66 There are... In the animals 13 It's just cats , But actually it's 13 The cat has only 10 Only the prediction is right . The model thinks it's a cat's 13 In an animal , Yes 1 A dog , Two pigs . therefore ,Precision( cat )= 10/13 = 76.9%
Recall
Take the cat for example , In total 18 A real cat , Our model thinks that there are only 10 It's just cats , The rest 3 Just a dog ,5 All pigs . this 5 Eighty percent of them are orange cats , Can understand . therefore ,Recall( cat )= 10/18 = 55.6%
Specificity
Take the cat for example , In total 48 Among animals that are not cats , According to the model, there are 45 It's not a cat . therefore ,Specificity( cat )= 45/48 = 93.8%.
Although in 45 In an animal , The model still thinks it's wrong 6 A dog and 4 Cats , But from a cat's point of view , There is nothing wrong with the judgment of the model .
( Here is the reference Wikipedia,Confusion Matrix The explanation of ,https://en.wikipedia.org/wiki/Confusion_matrix)
F1-Score
Through the formula , You can calculate that , For cats ,F1-Score=(2 * 0.769 * 0.556)/( 0.769 + 0.556) = 64.54%
Again , We can also calculate the secondary and tertiary index values of pigs and dogs respectively .
边栏推荐
- ABC#237 C
- [daiy4] copy of JZ35 complex linked list
- Halcon Chinese character recognition
- Codeforces Round #648 (Div. 2) D. Solve The Maze
- Halcon snap, get the area and position of coins
- Introduction Guide to stereo vision (7): stereo matching
- File server migration scheme of a company
- C#【必备技能篇】ConfigurationManager 类的使用(文件App.config的使用)
- IT冷知识(更新ing~)
- Introduction Guide to stereo vision (1): coordinate system and camera parameters
猜你喜欢
Add discount recharge and discount shadow ticket plug-ins to the resource realization applet
ROS learning 4 custom message
C# LINQ源码分析之Count
混淆矩阵(Confusion Matrix)
Wechat H5 official account to get openid climbing account
Introduction Guide to stereo vision (4): DLT direct linear transformation of camera calibration [recommended collection]
Summary of "reversal" problem in challenge Programming Competition
Programming implementation of subscriber node of ROS learning 3 subscriber
Nodejs modularization
Redis implements a high-performance full-text search engine -- redisearch
随机推荐
Task failed task_ 1641530057069_ 0002_ m_ 000000
Ecmascript6 introduction and environment construction
Blogger article navigation (classified, real-time update, permanent top)
[牛客网刷题 Day4] JZ55 二叉树的深度
Attention is all you need
Codeworks round 638 (Div. 2) cute new problem solution
Nodemon installation and use
Summary of "reversal" problem in challenge Programming Competition
牛顿迭代法(解非线性方程)
[code practice] [stereo matching series] Classic ad census: (4) cross domain cost aggregation
2310. 个位数字为 K 的整数之和
Array, date, string object method
皮尔森相关系数
容易混淆的基本概念 成员变量 局部变量 全局变量
Rebuild my 3D world [open source] [serialization-2]
Multiple linear regression (gradient descent method)
Summary and Reflection on issues related to seq2seq, attention and transformer in hands-on deep learning
Hello everyone, welcome to my CSDN blog!
Halcon wood texture recognition
Mengxin summary of LIS (longest ascending subsequence) topics