当前位置:网站首页>Confusion matrix
Confusion matrix
2022-07-05 08:59:00 【Wanderer001】
Reference resources Confusion matrix (Confusion Matrix) - cloud + Community - Tencent cloud
brief introduction
The confusion matrix is ROC The basis of curve drawing , At the same time, it is also the most basic method to measure the accuracy of classification model , Most intuitive , The simplest way to calculate .
Explain the version in one sentence :
The confusion matrix is the error classification of the statistical classification model , The number of observations classified into pairs , Then show the results in a table . This table is the confusion matrix .
Data analysis and mining system location
Confusion matrix is the index to judge the results of the model , Part of the model evaluation . Besides , Confusion matrix is often used to judge classifiers (Classifier) The advantages and disadvantages of , It is applicable to the data model of different types , Such as classification tree (Classification Tree)、 Logical regression (Logistic Regression)、 Linear discriminant analysis (Linear Discriminant Analysis) Other methods .
In the evaluation index of classification model , There are three common methods :
- Confusion matrix ( Also known as error matrix ,Confusion Matrix)
- ROC curve
- AUC area
This article mainly introduces the first method , Confusion matrix , Also known as error matrix .
The position of this method in the whole data analysis and mining system is shown in the figure below .
Definition of confusion matrix
Definition of confusion matrix
Confusion matrix (Confusion Matrix), Its essence is far less popular than its name sounds . matrix , It can be understood as a table , The confusion matrix is actually just a table .
Take the simplest binary classification in the classification model as an example , For this kind of problem , Our model ultimately needs to judge that the result of the sample is 0 still 1, Or rather, positive still negative.
We collect samples , Can directly know the real situation , What data results are positive, What are the results negative. meanwhile , We use the sample data to run out the results of the classification model , You can also know what the model thinks these data are positive, Which are negative.
therefore , We can get these four basic indicators , I call them primary indicators ( At the bottom of the ):
The real value is positive, The model considers that positive The number of (True Positive=TP)
The real value is positive, The model considers that negative The number of (False Negative=FN): This is the first type of statistical error (Type I Error)
The real value is negative, The model considers that positive The number of (False Positive=FP): This is the second type of statistical error (Type II Error)
The real value is negative, The model considers that negative The number of (True Negative=TN)
Present these four indicators together in the table , We can get such a matrix , We call it the confusion matrix (Confusion Matrix):
Index of confusion matrix
Predictive classification model , I must hope that the more accurate the better . that , Corresponding to the confusion matrix , That must be hope TP And TN A large number of , and FP And FN The number is small . So when we get the confusion matrix of the model , You need to see how many observations are in the second 、 The position corresponding to the four quadrants , The more values here, the better ; conversely , In the first place 、 The less the observed values in the corresponding positions of the three and four quadrants, the better .
Two level index
however , The confusion matrix counts numbers , Sometimes faced with a lot of data , Just count the numbers , It's hard to measure the quality of the model . Therefore, the confusion matrix extends the basic statistical results as follows 4 Indicators , I call them secondary indicators ( Obtained by adding, subtracting, multiplying and dividing the lowest index ):
Accuracy rate (Accuracy)—— For the whole model
Accuracy (Precision)
sensitivity (Sensitivity): It's the recall rate (Recall)
Specificity (Specificity)
I use tables to define these four indicators 、 Calculation 、 The understanding is summarized :
Through the above four secondary indicators , The result of the number in the confusion matrix can be transformed into 0-1 The ratio between . It's easy to make standardized measurements .
Expand on the basis of these four indicators , Another three-level indicator of the production order
Third level index
This indicator is called F1 Score. His formula is :
among ,P representative Precision,R representative Recall.
F1-Score The indicators synthesize Precision And Recall The result of the output .F1-Score The value range of is from 0 To 1 Of ,1 The output representing the model is the best ,0 The output of representative model is the worst .
Examples of confusion matrices
When the classification problem is dichotomous, the problem is , The confusion matrix can be calculated by the above method . When the results of classification are more than two , The confusion matrix also applies .
Take the following confusion matrix as an example , The purpose of our model is to predict what animals the sample is , This is our result :
Through the confusion matrix , We can draw the following conclusion :
Accuracy
In total 66 Of the animals , We're right in all 10 + 15 + 20=45 Samples , So the accuracy (Accuracy)=45/66 = 68.2%.
Take the cat for example , We can combine the above diagram into a bipartite problem :
Precision
therefore , Take the cat for example , The results of the model tell us ,66 There are... In the animals 13 It's just cats , But actually it's 13 The cat has only 10 Only the prediction is right . The model thinks it's a cat's 13 In an animal , Yes 1 A dog , Two pigs . therefore ,Precision( cat )= 10/13 = 76.9%
Recall
Take the cat for example , In total 18 A real cat , Our model thinks that there are only 10 It's just cats , The rest 3 Just a dog ,5 All pigs . this 5 Eighty percent of them are orange cats , Can understand . therefore ,Recall( cat )= 10/18 = 55.6%
Specificity
Take the cat for example , In total 48 Among animals that are not cats , According to the model, there are 45 It's not a cat . therefore ,Specificity( cat )= 45/48 = 93.8%.
Although in 45 In an animal , The model still thinks it's wrong 6 A dog and 4 Cats , But from a cat's point of view , There is nothing wrong with the judgment of the model .
( Here is the reference Wikipedia,Confusion Matrix The explanation of ,https://en.wikipedia.org/wiki/Confusion_matrix)
F1-Score
Through the formula , You can calculate that , For cats ,F1-Score=(2 * 0.769 * 0.556)/( 0.769 + 0.556) = 64.54%
Again , We can also calculate the secondary and tertiary index values of pigs and dogs respectively .
边栏推荐
- Codeforces Round #648 (Div. 2) E.Maximum Subsequence Value
- 2020 "Lenovo Cup" National College programming online Invitational Competition and the third Shanghai University of technology programming competition
- [daily training -- Tencent selected 50] 557 Reverse word III in string
- ABC#237 C
- Count of C # LINQ source code analysis
- kubeadm系列-01-preflight究竟有多少check
- np.allclose
- Understanding rotation matrix R from the perspective of base transformation
- Luo Gu p3177 tree coloring [deeply understand the cycle sequence of knapsack on tree]
- 皮尔森相关系数
猜你喜欢
Rebuild my 3D world [open source] [serialization-1]
Introduction Guide to stereo vision (7): stereo matching
Nodejs modularization
Introduction Guide to stereo vision (1): coordinate system and camera parameters
Typescript hands-on tutorial, easy to understand
Use and programming method of ros-8 parameters
Solution to the problems of the 17th Zhejiang University City College Program Design Competition (synchronized competition)
Beautiful soup parsing and extracting data
TF coordinate transformation of common components of ros-9 ROS
Ros- learn basic knowledge of 0 ROS - nodes, running ROS nodes, topics, services, etc
随机推荐
ECMAScript6介绍及环境搭建
多元线性回归(sklearn法)
迁移学习和域自适应
Attention is all you need
Use arm neon operation to improve memory copy speed
Add discount recharge and discount shadow ticket plug-ins to the resource realization applet
Beautiful soup parsing and extracting data
Configuration and startup of kubedm series-02-kubelet
nodejs_ 01_ fs. readFile
Bit operation related operations
Programming implementation of ROS learning 6 -service node
IT冷知识(更新ing~)
OpenFeign
Array,Date,String 对象方法
Halcon wood texture recognition
[code practice] [stereo matching series] Classic ad census: (6) multi step parallax optimization
Multiple linear regression (gradient descent method)
Introduction Guide to stereo vision (5): dual camera calibration [no more collection, I charge ~]
什么是防火墙?防火墙基础知识讲解
Solution to the problem of the 10th Programming Competition (synchronized competition) of Harbin University of technology "Colin Minglun Cup"