当前位置:网站首页>Evaluating Machine Learning Models - Excerpt
Evaluating Machine Learning Models - Excerpt
2022-07-31 06:17:00 【Young_win】
Overview
Typically, models are not evaluated on the same data on which they were trained.Because, as training progresses, model performance on training data keeps improving, but performance on never-before-seen data stops changing or startsdecline.
The goal of machine learning is to get a model that can generalize, that is, a model that performs well on never-before-seen data, so how to It is very important to reliably measure the generalization ability of a model. The following content mainly introduces how to measure the generalization ability of a model!
In addition, the difficulty of improving the generalization of the model is overfitting, which will be introduced later!
train/valid/test
When evaluating the model, divide the data into 3 sets: train/valid/test.
train: train the model on this dataset;
valid: evaluate the model on this dataset;
test: once the best parameters are found, thenOn the dataset, the last test is performed.
A. Why do you need test data?
(1.) When developing a model, always adjust the model configuration, such as the number of layers and the size of each layer (hyperparameters). This adjustment process uses the model in the valid data.The performance on it is used as a feedback signal, and the process is essentially a kind of learning: in a certain parameter space, to find a good model configuration.
(2.) Therefore, adjusting the model configuration based on the performance of the model on the valid will soon overfit the model on the valid, even if you do not directly train the model on the valid>.The key to this phenomenon is information leakage: every time the model hyperparameters are adjusted based on the performance of the model on the valid, some data information about the valid is leaked into the model.
(3.) Even if the final model has good performance on valid, because this is the purpose of your optimization; and we care about the performance of the model on new data, not on validperformance.Therefore, you need to evaluate the model on a completely different, never-before-seen data set, the test dataset.
(4.) Your model must cannot read any information related to the test data, even indirectly.The measure of generalization ability is inaccurate if the model is tuned based on the performance of the test data.
B. There is less available data, how to divide train/valid/test?
(1.) Easiest set aside for validation
Once the hyperparameters are tuned, train the final model from scratch on all non-test data.
Disadvantages of this evaluation method: If few data are available, the valid and test data contain too few samples to be statistically representative.
The specific way to find this problem through experiments: If different random scrambles are performed before dividing the data, the performance of the final model is very different, and this problem exists.
(2.) K-fold verification
In order to solve the above-mentioned problem that the performance of the model obtained by different divisions of "trian-test" varies greatly, "K-fold verification" is introduced.
K-fold validation, that is, dividing the data into K partitions of the same size, for each partition i, train the model on the remaining K-1 partitions, and then evaluate the model on partition i.Final score = average of K scores.
K model train training + valid evaluation to get the optimal hyperparameters;
Use the hyperparameters to train a model M on train+valid;
Use model M to evaluate on test!
(3.) Repeated K-fold validation with scrambled data
Method: Use K-fold validation multiple times, and scramble the data before dividing the data into K partitions each time.The final score is the average of each K-fold validation score.
Notes on evaluating models
Data representation
Usually, it is required that both train and test can represent the current data, so, before dividing the data set, the data should be randomly shuffled; to avoid the mnist data set train only containsNumbers 1-7, test only contains numbers 9, such a ridiculous mistake.
Time Arrow
If you want to predict the future from the past, splitting the data should not randomly shuffle the data, because shuffling creates a time leak: your modelWill be training on future data.In this case, it should be ensured that the time of test data is later than that of train data.
Data redundancy
Some samples in the data appear multiple times, then shuffling the data and dividing train/valid will lead to data redundancy between the two data sets, soThere will be a problem of evaluating model performance on part of the train data.That is, make sure that there is no sample intersection between train and valid.
边栏推荐
- Pytorch学习笔记13——Basic_RNN
- 深度学习知识点杂谈
- Cholesterol-PEG-NHS NHS-PEG-CLS cholesterol-polyethylene glycol-active ester can modify small molecular materials
- Pytorch常用函数
- pyspark.ml feature transformation module
- ERROR Error: No module factory availabl at Object.PROJECT_CONFIG_JSON_NOT_VALID_OR_NOT_EXIST ‘Error
- 活体检测CDCN学习笔记
- ROS之service编程的学习和理解
- crontab的定时操作
- box-shadow相关属性
猜你喜欢

活体检测FaceBagNet阅读笔记

Pytorch学习笔记7——处理多维特征的输入

qt:cannot open C:\Users\某某某\AppData\Local\Temp\main.obj.15576.16.jom for write

浏览器查找js绑定或者监听的事件

DSPE-PEG-Azide DSPE-PED-N3 Phospholipid-Polyethylene Glycol-Azide Lipid PFG

Cholesterol-PEG-Amine CLS-PEG-NH2 Cholesterol-Polyethylene Glycol-Amino Research Use

2021-09-30

Notes on creating a new virtual machine in Hyper-V

Sqlite column A data is copied to column B

ROS之service编程的学习和理解
随机推荐
活体检测CDCN学习笔记
人脸识别AdaFace学习笔记
Pytorch实现ResNet
This in js points to the prototype object
Cholesterol-PEG-Acid CLS-PEG-COOH Cholesterol-Polyethylene Glycol-Carboxyl Modified Peptides
DSPE-PEG-Biotin,CAS:385437-57-0,磷脂-聚乙二醇-生物素可延长循环半衰期
Cholesterol-PEG-NHS NHS-PEG-CLS cholesterol-polyethylene glycol-active ester can modify small molecular materials
MW:3400 4-Arm PEG-DSPE 四臂-聚乙二醇-磷脂一种饱和的18碳磷脂
Redis-Hash
Tensorflow——demo
Markdown help documentation
cocos2d-x implements cross-platform directory traversal
Session和Cookie,Token
Sourcery插件(自动提升代码质量)
DSPE-PEG-Thiol DSPE-PEG-SH 磷脂-聚乙二醇-巯基脂质体制备用
科研试剂Cholesterol-PEG-Maleimide,CLS-PEG-MAL,胆固醇-聚乙二醇-马来酰亚胺
crontab的定时操作
数据预处理、特征工程和特征学习-摘抄
mPEG-DMPE Methoxy-polyethylene glycol-bismyristyl phosphatidylethanolamine for stealth liposome formation
Natural language processing related list