当前位置：网站首页>15. Model evaluation and selection

15. Model evaluation and selection

2022-07-29 00:28:00 【WuJiaYFN】

primary coverage

Method of debugging a machine learning algorithm
Machine learning diagnostics
Steps to evaluate hypothetical functions
Model selection and cross validation set

One 、 Method of debugging a machine learning algorithm

Get more training data
Try to reduce the number of features
Try more features
Try adding polynomial features
Try to reduce the degree of regularization λ
Try increasing the degree of regularization λ

Insert picture description here

Two 、 Machine learning diagnostics

2.1 The reason for the introduction

If we try to use the above six methods to improve our algorithm one by one , It may take many times of debugging and can't get good results , So we introduced Machine learning diagnostics
Machine learning diagnostics It can help us know which of the above six methods are useful for our algorithm

2.2 Definition of machine learning diagnostics

Diagnostic method It means ： It's a test method , By performing this test , Be able to gain insight into whether an algorithm is useful ; You can also know what kind of attempts should be made to improve an algorithm through this test

3、 ... and 、 Steps to evaluate hypothetical functions

Evaluate whether an algorithm has gone through the fitting step ：

First step ： Divide the test set and training set

choice 70% As a training set , be left over 30% As a test set
If the number of data is randomly distributed , You can choose the front 70% Data as a training set ; If it is not random, it should be selected randomly 70% As a training set

The second step ： Calculate the test error

After the model learns its parameters through the training set , Run the model on the test set , There are really many different problems. There are two ways to calculate the error
about linear regression model （ The return question ）, Use the test set data to calculate the cost function
about Logistic regression model （ Classification problem ）, In addition to using the test set data to calculate the cost function , There is also a more understandable method of defining test errors , be called Misclassification rate
Misclassification rate It's also called 0/1 Misclassification rate , That is, the rate of misclassification , The calculation method is as follows ： For each test set data calculation err(hθ(x),y) Value , Then calculate the average value of the calculation results
- err(hθ(x),y) It means ： If the classification prediction results hθ(x) error , be err The value is 1; If hθ(x) The prediction is correct , be err The value is 0.
- The overall test error is all err And value .

Four 、 Model selection and cross validation set

4.1 Problem introduction

Suppose we need to be in 10 Choose between quadratic models with different times ：
One reason for over fitting is ： Debug only on the test set θ The training error obtained , It is usually not a good estimate of the actual generalization error

4.2 The disadvantages of dividing data into training sets and test sets to select models

For the top 10 A polynomial model , Train the corresponding of each polynomial on the training set θ, Then use the test set separately Calculation error , Hypothesis discovery J_test(θ^5) The minimum value of , So we will choose the fifth model
such Disadvantages of choice ： The model selected at this time , It can best fit the parameter values and polynomials of the test set . therefore , Then use the same test set to evaluate the hypothetical function , It's obviously unfair , It is likely to lead to over fitting .

4.3 Correct model selection method —— Cross validation set

Cross validation set ： Divide the dataset into 6:2:2 In the third part of ： That is to use 60% As a training set training set、 Use 20% As a cross validation set cross validation set(cv, Or simply validation set)、 Use 20% Data as a tester test set

4.4 Specific model selection process

Use training sets to learn 10 A model
use 10 The two models calculate the cross validation set error respectively （ The value of the cost function ）
- On each set Error calculation formula
Select the model with the lowest cost function
Use steps 3 The extended error of the selected model is calculated for the test set （ The value of the cost function ）
Find the model selected by this method , Make predictions on the test set , A more ideal generalization error can be obtained

If you think the article is good , You can give me some praise and encourage me , Welcome to collect
Pay attention to me , Let's study together , Progress together ！！！

原网站

版权声明
本文为[WuJiaYFN]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/210/202207282232308678.html