当前位置:网站首页>Evaluating Machine Learning Models - Excerpt
Evaluating Machine Learning Models - Excerpt
2022-07-31 06:17:00 【Young_win】
Overview
Typically, models are not evaluated on the same data on which they were trained.Because, as training progresses, model performance on training data keeps improving, but performance on never-before-seen data stops changing or startsdecline.
The goal of machine learning is to get a model that can generalize, that is, a model that performs well on never-before-seen data, so how to It is very important to reliably measure the generalization ability of a model. The following content mainly introduces how to measure the generalization ability of a model!
In addition, the difficulty of improving the generalization of the model is overfitting, which will be introduced later!
train/valid/test
When evaluating the model, divide the data into 3 sets: train/valid/test.
train: train the model on this dataset;
valid: evaluate the model on this dataset;
test: once the best parameters are found, thenOn the dataset, the last test is performed.
A. Why do you need test data?
(1.) When developing a model, always adjust the model configuration, such as the number of layers and the size of each layer (hyperparameters). This adjustment process uses the model in the valid data.The performance on it is used as a feedback signal, and the process is essentially a kind of learning: in a certain parameter space, to find a good model configuration.
(2.) Therefore, adjusting the model configuration based on the performance of the model on the valid will soon overfit the model on the valid, even if you do not directly train the model on the valid>.The key to this phenomenon is information leakage: every time the model hyperparameters are adjusted based on the performance of the model on the valid, some data information about the valid is leaked into the model.
(3.) Even if the final model has good performance on valid, because this is the purpose of your optimization; and we care about the performance of the model on new data, not on validperformance.Therefore, you need to evaluate the model on a completely different, never-before-seen data set, the test dataset.
(4.) Your model must cannot read any information related to the test data, even indirectly.The measure of generalization ability is inaccurate if the model is tuned based on the performance of the test data.
B. There is less available data, how to divide train/valid/test?
(1.) Easiest set aside for validation
Once the hyperparameters are tuned, train the final model from scratch on all non-test data.
Disadvantages of this evaluation method: If few data are available, the valid and test data contain too few samples to be statistically representative.
The specific way to find this problem through experiments: If different random scrambles are performed before dividing the data, the performance of the final model is very different, and this problem exists.
(2.) K-fold verification
In order to solve the above-mentioned problem that the performance of the model obtained by different divisions of "trian-test" varies greatly, "K-fold verification" is introduced.
K-fold validation, that is, dividing the data into K partitions of the same size, for each partition i, train the model on the remaining K-1 partitions, and then evaluate the model on partition i.Final score = average of K scores.
K model train training + valid evaluation to get the optimal hyperparameters;
Use the hyperparameters to train a model M on train+valid;
Use model M to evaluate on test!
(3.) Repeated K-fold validation with scrambled data
Method: Use K-fold validation multiple times, and scramble the data before dividing the data into K partitions each time.The final score is the average of each K-fold validation score.
Notes on evaluating models
Data representation
Usually, it is required that both train and test can represent the current data, so, before dividing the data set, the data should be randomly shuffled; to avoid the mnist data set train only containsNumbers 1-7, test only contains numbers 9, such a ridiculous mistake.
Time Arrow
If you want to predict the future from the past, splitting the data should not randomly shuffle the data, because shuffling creates a time leak: your modelWill be training on future data.In this case, it should be ensured that the time of test data is later than that of train data.
Data redundancy
Some samples in the data appear multiple times, then shuffling the data and dividing train/valid will lead to data redundancy between the two data sets, soThere will be a problem of evaluating model performance on part of the train data.That is, make sure that there is no sample intersection between train and valid.
边栏推荐
猜你喜欢

Pytorch学习笔记09——多分类问题

2021年软件测试面试题大全

变分自编码器VAE实现MNIST数据集生成by Pytorch

RuntimeError: CUDA error: no kernel image is available for execution on the device问题记录

DSPE-PEG-Azide DSPE-PED-N3 Phospholipid-Polyethylene Glycol-Azide Lipid PFG

pytorch学习笔记10——卷积神经网络详解及mnist数据集多分类任务应用

Embedding cutting-edge understanding

VS2017连接MYSQL

unicloud 发布后小程序提示连接本地调试服务失败,请检查客户端是否和主机在同一局域网下

Pytorch学习笔记13——Basic_RNN
随机推荐
评估机器学习模型-摘抄
Talking about the understanding of CAP in distributed mode
Shell/Vim related list
Cholesterol-PEG-Acid CLS-PEG-COOH 胆固醇-聚乙二醇-羧基修饰肽类化合物
SSH automatic reconnection script
VS connects to MYSQL through ODBC (2)
Flow control statement in js
Tencent Cloud GPU Desktop Server Driver Installation
unicloud 发布后小程序提示连接本地调试服务失败,请检查客户端是否和主机在同一局域网下
UiBot has an open Microsoft Edge browser and cannot perform the installation
MYSQL事务与锁问题处理
活体检测FaceBagNet阅读笔记
朴素贝叶斯文本分类(代码实现)
Word vector - demo
MYSQL transaction and lock problem handling
Introduction to CLS-PEG-FITC Fluorescein-PEG-CLS Cholesterol-PEG-Fluorescein
jenkins +miniprogram-ci upload WeChat applet with one click
DSPE-PEG-COOH CAS: 1403744-37-5 Phospholipid-polyethylene glycol-carboxy lipid PEG conjugate
Software Testing Interview Questions 2021
自然语言处理相关list