当前位置：网站首页>Advantages and disadvantages of evaluation methods

Advantages and disadvantages of evaluation methods

2022-07-06 10:25:00 【How about a song without trace】

1、 Over fitting ： When the learner learns the training samples well , It is possible to take the characteristics of the trained samples as the general properties of all potential samples , This will lead to the decline of Pan China capability （ Generalization ability refers to the ability of the learning model to be applied to unknown samples ）.

2、 Under fitting ： Low learning ability , I think the general characteristics are all characteristics .

Evaluation methods ：

Set aside method ： If the training set contains the vast majority of samples , Then the trained sample may be close to the desired training model , But because of the small test set , The assessment results may not be accurate enough , The pattern of basic partitioned data sets ：2:1,4:1 The front is used for training , The latter is used for testing .
Cross validation ： Equal division , Stratified sampling , Take the mean , The defect is ： Large data sets are too expensive , Spend more time .
Self help law ： Loop from the overall data into the sample , Put it back again , The final initial data are 0.368 The sample of does not appear , Used for testing . The self-help method can be used to test from the samples that appear in the initial data set , Such a test is also known as out of package estimation . advantage ： The self-help method is smaller in the data set , It's hard to divide training effectively \ Test sets are useful , Multiple different training sets can be generated from the initial data set , shortcoming ： But it changes the distribution of data sets , This will introduce Estimated deviation .

But when the initial data volume is enough , Set aside method and cross validation method are more commonly used .

Participate in the final parameter model ：

General rules of parameter adjustment ： Select a range and a varying step size for each parameter , This is a compromise between computational overhead and performance .

Performance metrics ： Measure the pan China capability of the model , Performance depends not only on Algorithms and data , It also determines mission requirements .

The most commonly used performance measure for regression tasks ： Mean square error .

Recall rate （TP/(TP+FN)）、 Precision rate （TP/(TP+FP)）：TP Real examples FP False positive example TN True counter example FN False counter example .

F1 It is based on the harmonic average of recall and precision ：2*TP/( Total number of samples +TP-TN)

ROC: Characteristics of test work . The horizontal axis TPR( Real examples )=TP/(TP+FN), The vertical axis FPR（ False positive example ）:FP/(TN+FP).

Normalization ： Map values from different ranges of variation to the same fixed range , Common is [0,1], Also known as normalization .

deviation ： The difference between the expected output and the real tag , Describe the fitting ability of the learning algorithm itself .

Generalization error can be decomposed into deviation 、 variance （ Have you measured the change of learning performance caused by the change of the same size training set , The impact of data perturbation is characterized ）、 And noise （ The lower bound of the expected generalization error that any learning algorithm can achieve in the current task is expressed ） The sum of the .

原网站

版权声明
本文为[How about a song without trace]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/187/202207060910587696.html