当前位置：网站首页>Machine learning (zhouzhihua) Chapter 2 model selection and evaluation notes learning experience

Machine learning (zhouzhihua) Chapter 2 model selection and evaluation notes learning experience

2022-07-24 05:51:00 【Ml -- xiaoxiaobai】

The first 2 Chapter Model selection and evaluation The learning

A few nouns

Error rate (error rate)

The proportion of misclassified samples in the total samples

precision (accuracy)

The ratio of the number of correctly classified samples to the total number of samples ;
be equal to 1 subtract Error rate

Confusion matrix （confusion matrix）

The first line represents a real positive example , The second line represents a real counterexample ; The first column represents the positive example of model prediction , The second column represents the counter example of the model prediction .

Real examples （true positive,TP）

True example , And it is judged as a positive example by the model . Confusion matrix (1,1) Elements .

False counter example （false negative,FN）

True example , But it was misjudged as a counterexample by the model . Confusion matrix (1,2) Elements .

False positive example （false positive,FP）

Real counterexample , But it was misjudged as a positive example by the model . Confusion matrix (2,1) Elements .

True counter example （true negative,TN）

Real counterexample , And is judged as a counterexample by the model . Confusion matrix (2,2) Elements .

Precision rate （precision,P）

Take information retrieval as an example , Namely “ How much of the retrieved information is of real interest to users ”, Or in the retrieved positive example , How many are real positive examples .
$\frac{TP}{TP + FP}$

Macro accuracy rate （macro-P）

When there are multiple confusion matrices , We should integrate their results , You can calculate their respective precision rates and then average , Get macro precision .
$\text { macro- } P=\frac{1}{n} \sum_{i=1}^{n} P_{i}$

Micro precision （micro-P）

When there are multiple confusion matrices , We should integrate their results , The average confusion matrix can be calculated , Then get the micro precision .
$\text { micro- } P=\frac{\overline{T P}}{\overline{T P}+\overline{F P}}$

Recall rate （recall,R）

Take information retrieval as an example , Namely “ How much of the information users are interested in has been retrieved ”, Or in the real positive example , What proportion is detected by the model .
$\frac{TP}{TP + FN}$

Macro recall （macro-R）

When there are multiple confusion matrices , We should integrate their results , You can calculate their recall rates and then average , Get macro recall .
$\text { macro- } R=\frac{1}{n} \sum_{i=1}^{n} R_{i}$

Micro recall （micro-R）

When there are multiple confusion matrices , We should integrate their results , The average confusion matrix can be calculated , Then get micro recall .
$\operatorname{micro}-R=\frac{\overline{T P}}{\overline{T P}+\overline{F N}}$

$F_\beta$

Performance evaluation based on comprehensive precision and recall （performance measure）
$F_{\beta}=\frac{\left(1+\beta^{2}\right) \times P \times R}{\left(\beta^{2} \times P\right)+R}$
among $\beta > 0$ , It shows the relative importance of recall ratio to precision ratio . When $\beta \lt 1$ It means that the accuracy rate is more important , When $\beta \gt 1$ Means recall is more important .

$F_1$

When $\beta = 1$ , Precision is as important as recall , At this time $F_\beta$ Turn into $F_1$ .
$1=\frac{2 \times P \times R}{P+R}=\frac{2 \times T P}{\text { Total number of samples }+T P-T N}$
In fact, it is the harmonic average of precision and recall ：
$\frac{1}{F 1}=\frac{1}{2} \cdot\left(\frac{1}{P}+\frac{1}{R}\right)$

macro $F_1$ （macro- $F_1$ ）

When there are multiple confusion matrices , We should integrate their results , You can calculate their precision and recall, and then , Then calculate the macro precision and macro recall , Finally get macro $F_1$ .
$\text { macro- } F 1=\frac{2 \times \text { macro- } P \times \text { macro }-R}{\text { macro- } P+\text { macro- } R}$

tiny $F_1$ （micro- $F_1$ ）

When there are multiple confusion matrices , We should integrate their results , The average confusion matrix can be calculated , Then calculate the micro precision and micro recall , Finally, we get micro $F_1$ .
$\text { micro- } F 1=\frac{2 \times \operatorname{micro}-P \times \text { micro- } R}{\text { micro- } P+\operatorname{micro}-R}$

ROC（Receiver Operating Ccharacteristic） curve

Measure model performance , Abscissa personal leave positive rate （False Positive Rate,FPR）, That is, the proportion of cases wrongly judged as positive examples to real counter examples ：
$\mathrm{TPR}=\frac{T P}{T P+F N}$
The ordinate is the real example rate （True Positive Rate）, That is, the accuracy rate ：
$\mathrm{FPR}=\frac{F P}{T N+F P}$

AUC（area under ROC curve）

ROC The area under the curve , The biggest is 1（ Ideal situation ）.

Training error （training error）/ Experience in error （empirical error）

Model / The error of the learner on the training set ;
So-called “ Experience ” It is the estimation of the real error .

Normalization error （generalization error）

Model / The error of the learner on the test set .

Over fitting （overfitting）

A small error can be obtained in the training set , That is, the empirical error is small , But the normalization error is large , The model learns many specific rules of the training set itself .
Just believe in “ $P\neq NP$ ” Then there must be a fitting problem , Over fitting can only alleviate but not eliminate .

Under fitting （underfitting）

And Over fitting contrary , Did not learn the essential laws of the training set , Can't classify well .

Deviation and variance decomposition （bias-variance decomposition）

On the model / The generalization error rate of the learner is decomposed . In the end, you can get ：
$D)=\text { bias }^{2}(\boldsymbol{x})+\operatorname{var}(\boldsymbol{x})+\varepsilon^{2}$
The first item represents deviation $\operatorname{bias}^{2}(\boldsymbol{x})=(\bar{f}(\boldsymbol{x})-y)^{2}$ , The degree of deviation between the expected prediction and the true value of the model is measured , The fitting ability of the model itself is described ; The second term represents variance $\operatorname{var}(\boldsymbol{x})=\mathbb{E}_{D}\left[(f(\boldsymbol{x} ; D)-\bar{f}(\boldsymbol{x}))^{2}\right]$ , Measured the same size , Changes in learning performance caused by changes in different data sets （ Instability ）, The influence on data disturbance is described ; The third item represents noise $\varepsilon^{2}=\mathbb{E}_{D}\left[\left(y_{D}-y\right)^{2}\right]$ , Because the data itself is inevitably relative to the ideal distribution of noise , It is also the lower bound of the expected generalization error that any model can achieve , It depicts the difficulty of the learning problem itself .
Therefore, the generalization error is determined by the learning ability of the model 、 Sufficiency of data 、 The difficulty of the learning task itself determines .

deviation - Variance dilemma （bias-variance dilemma）

Generally speaking, deviation and variance “ shift ”, It is summarized as an intuitive picture ：
Please add a picture description

Model selection （model selection）

The so-called model selection , Is to adopt what algorithm / Hypothesis function , What parameters are set / Hyperparameters .

Divide training / Method of test set

Set aside method （hold-out）

Directly divide the data set into training and testing sets , The two sets are mutually exclusive .
The single set aside method is often not reliable , May produce unexpected bias, So it can be random many times shuffle, Take the average , At the same time, the standard deviation of the evaluation results can also be obtained .
Usually about 2/3～4/5 As a training set .（ see bias/variance tradeoff）

Stratified sampling （stratified sampling）

Samples that retain the category scale .

deviation / Variance tradeoff （bias/variance tradeoff）

If the test set is too small , test result / Variance of evaluation results （variance） more ;
If the training set is too small , The training model itself may lead to deviation （bias） more .

K Crossover verification

Divide the data set into k Equal parts , Take one of them as the test set in turn , The remaining k-1 As a training set , Will get the test results （ altogether k Time ） Average , That is the final evaluation result . Because there is still randomness when dividing the data set , So you can do it several times at random k Crossover verification , For many times k Fold cross validation average , for example ,10 Time 10 Crossover verification .
k The most commonly used is 10, In addition, it is also commonly used 5、20.

Leave a cross validation （Leave-One-Out,LOO）

When k In cross validation k When the total number of samples is equal to , Each divided set contains only one sample , It's called leave a cross validation .

Self help law （bootstrapping）

The key lies in the use of self-service sampling （bootstrap sampling）, That is, a data set with the same sample size as the original data set is sampled back from the original data set . Suppose the original data set has m Samples , Then the probability that a sample will not be taken out is $\left(1-\frac{1}{m}\right)^{m}$ , When the data set is large enough ：
$\lim _{m \mapsto \infty}\left(1-\frac{1}{m}\right)^{m} \mapsto \frac{1}{e} \approx 0.368$
So about 1/3 The sample of was not taken （ So it's also called “ Out of the bag estimate ”,out-of-bag estimate）, Thus, the data set obtained by autonomous sampling method can be used as the training set , The original data set is used as the test set .
But the self-help method naturally introduces deviation in this process bias, Therefore, its shortcomings at this time .

Adjustable parameter （parameter tuning）

you 're right , It's called tuning parameter . Parameters are generally divided into two types , One is the parameters of the model itself , That is, the model needs to be constantly modified through learning , There are often many ; The other is called “ Hyperparameters ” In fact, I think the choice is about the model / Parameters of the algorithm , Usually not too much .

原网站

版权声明
本文为[Ml -- xiaoxiaobai]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/205/202207240516575785.html