当前位置:网站首页>Evaluating Machine Learning Models - Excerpt
Evaluating Machine Learning Models - Excerpt
2022-07-31 06:17:00 【Young_win】
Overview
Typically, models are not evaluated on the same data on which they were trained.Because, as training progresses, model performance on training data keeps improving, but performance on never-before-seen data stops changing or startsdecline.
The goal of machine learning is to get a model that can generalize, that is, a model that performs well on never-before-seen data, so how to It is very important to reliably measure the generalization ability of a model. The following content mainly introduces how to measure the generalization ability of a model!
In addition, the difficulty of improving the generalization of the model is overfitting, which will be introduced later!
train/valid/test
When evaluating the model, divide the data into 3 sets: train/valid/test.
train: train the model on this dataset;
valid: evaluate the model on this dataset;
test: once the best parameters are found, thenOn the dataset, the last test is performed.
A. Why do you need test data?
(1.) When developing a model, always adjust the model configuration, such as the number of layers and the size of each layer (hyperparameters). This adjustment process uses the model in the valid data.The performance on it is used as a feedback signal, and the process is essentially a kind of learning: in a certain parameter space, to find a good model configuration.
(2.) Therefore, adjusting the model configuration based on the performance of the model on the valid will soon overfit the model on the valid, even if you do not directly train the model on the valid>.The key to this phenomenon is information leakage: every time the model hyperparameters are adjusted based on the performance of the model on the valid, some data information about the valid is leaked into the model.
(3.) Even if the final model has good performance on valid, because this is the purpose of your optimization; and we care about the performance of the model on new data, not on validperformance.Therefore, you need to evaluate the model on a completely different, never-before-seen data set, the test dataset.
(4.) Your model must cannot read any information related to the test data, even indirectly.The measure of generalization ability is inaccurate if the model is tuned based on the performance of the test data.
B. There is less available data, how to divide train/valid/test?
(1.) Easiest set aside for validation
Once the hyperparameters are tuned, train the final model from scratch on all non-test data.
Disadvantages of this evaluation method: If few data are available, the valid and test data contain too few samples to be statistically representative.
The specific way to find this problem through experiments: If different random scrambles are performed before dividing the data, the performance of the final model is very different, and this problem exists.
(2.) K-fold verification
In order to solve the above-mentioned problem that the performance of the model obtained by different divisions of "trian-test" varies greatly, "K-fold verification" is introduced.
K-fold validation, that is, dividing the data into K partitions of the same size, for each partition i, train the model on the remaining K-1 partitions, and then evaluate the model on partition i.Final score = average of K scores.
K model train training + valid evaluation to get the optimal hyperparameters;
Use the hyperparameters to train a model M on train+valid;
Use model M to evaluate on test!
(3.) Repeated K-fold validation with scrambled data
Method: Use K-fold validation multiple times, and scramble the data before dividing the data into K partitions each time.The final score is the average of each K-fold validation score.
Notes on evaluating models
Data representation
Usually, it is required that both train and test can represent the current data, so, before dividing the data set, the data should be randomly shuffled; to avoid the mnist data set train only containsNumbers 1-7, test only contains numbers 9, such a ridiculous mistake.
Time Arrow
If you want to predict the future from the past, splitting the data should not randomly shuffle the data, because shuffling creates a time leak: your modelWill be training on future data.In this case, it should be ensured that the time of test data is later than that of train data.
Data redundancy
Some samples in the data appear multiple times, then shuffling the data and dividing train/valid will lead to data redundancy between the two data sets, soThere will be a problem of evaluating model performance on part of the train data.That is, make sure that there is no sample intersection between train and valid.
边栏推荐
- Tensorflow——演示
- TransactionTemplate 事务编程式写法
- 科学研究用磷脂-聚乙二醇-活性酯 DSPE-PEG-NHS CAS:1445723-73-8
- DSPE-PEG-Biotin, CAS: 385437-57-0, phospholipid-polyethylene glycol-biotin prolongs circulating half-life
- SSH automatic reconnection script
- DSPE-PEG-COOH CAS: 1403744-37-5 Phospholipid-polyethylene glycol-carboxy lipid PEG conjugate
- 计算图像数据集均值和方差
- 2022年SQL大厂高频实战面试题(详细解析)
- Data Preprocessing, Feature Engineering, and Feature Learning - Excerpt
- Sqlite column A data is copied to column B
猜你喜欢

2021年软件测试面试题大全

Cholesterol-PEG-Amine CLS-PEG-NH2 Cholesterol-Polyethylene Glycol-Amino Research Use

mPEG-DMPE Methoxy-polyethylene glycol-bismyristyl phosphatidylethanolamine for stealth liposome formation

Cholesterol-PEG-Thiol CLS-PEG-SH Cholesterol-Polyethylene Glycol-Sulfhydryl

Notes on creating a new virtual machine in Hyper-V

MW:3400 4-Arm PEG-DSPE 四臂-聚乙二醇-磷脂一种饱和的18碳磷脂

Embedding cutting-edge understanding

【解决问题】RuntimeError: The size of tensor a (80) must match the size of tensor b (56) at non-singleton

Navicat从本地文件中导入sql文件

The browser looks for events bound or listened to by js
随机推荐
The browser looks for events bound or listened to by js
Pytorch每日一练——预测泰坦尼克号船上的生存乘客
深度学习知识点杂谈
Fluorescein-PEG-DSPE 磷脂-聚乙二醇-荧光素荧光磷脂PEG衍生物
二进制转换成十六进制、位运算、结构体
podspec 校验依赖出错问题 pod lib lint ,需要指定源
Pytorch学习笔记09——多分类问题
Tensorflow——demo
Natural language processing related list
CAS:474922-22-0 Maleimide-PEG-DSPE 磷脂-聚乙二醇-马来酰亚胺简述
pytorch学习笔记10——卷积神经网络详解及mnist数据集多分类任务应用
TransactionTemplate 事务编程式写法
mysql 事务原理详解
浏览器中的画中画(Picture-in-Picture)API
MW: 3400 4-Arm PEG-DSPE four-arm-polyethylene glycol-phospholipid a saturated 18-carbon phospholipid
Understanding of objects and functions in js
2022年SQL大厂高频实战面试题(详细解析)
DC-CDN学习笔记
微信小程序源码获取与反编译方式
crontab的定时操作