当前位置:网站首页>15. Model evaluation and selection
15. Model evaluation and selection
2022-07-29 00:28:00 【WuJiaYFN】
primary coverage
- Method of debugging a machine learning algorithm
- Machine learning diagnostics
- Steps to evaluate hypothetical functions
- Model selection and cross validation set
One 、 Method of debugging a machine learning algorithm
- Get more training data
- Try to reduce the number of features
- Try more features
- Try adding polynomial features
- Try to reduce the degree of regularization λ
- Try increasing the degree of regularization λ

Two 、 Machine learning diagnostics
2.1 The reason for the introduction
- If we try to use the above six methods to improve our algorithm one by one , It may take many times of debugging and can't get good results , So we introduced Machine learning diagnostics
- Machine learning diagnostics It can help us know which of the above six methods are useful for our algorithm
2.2 Definition of machine learning diagnostics
- Diagnostic method It means : It's a test method , By performing this test , Be able to gain insight into whether an algorithm is useful ; You can also know what kind of attempts should be made to improve an algorithm through this test
3、 ... and 、 Steps to evaluate hypothetical functions
Evaluate whether an algorithm has gone through the fitting step :
First step : Divide the test set and training set
choice 70% As a training set , be left over 30% As a test set
If the number of data is randomly distributed , You can choose the front 70% Data as a training set ; If it is not random, it should be selected randomly 70% As a training set

The second step : Calculate the test error
After the model learns its parameters through the training set , Run the model on the test set , There are really many different problems. There are two ways to calculate the error
about linear regression model ( The return question ), Use the test set data to calculate the cost function

about Logistic regression model ( Classification problem ), In addition to using the test set data to calculate the cost function , There is also a more understandable method of defining test errors , be called Misclassification rate

Misclassification rate It's also called 0/1 Misclassification rate , That is, the rate of misclassification , The calculation method is as follows : For each test set data calculation err(hθ(x),y) Value , Then calculate the average value of the calculation results

- err(hθ(x),y) It means : If the classification prediction results hθ(x) error , be err The value is 1; If hθ(x) The prediction is correct , be err The value is 0.
- The overall test error is all err And value .
Four 、 Model selection and cross validation set
4.1 Problem introduction
Suppose we need to be in 10 Choose between quadratic models with different times :

One reason for over fitting is : Debug only on the test set θ The training error obtained , It is usually not a good estimate of the actual generalization error
4.2 The disadvantages of dividing data into training sets and test sets to select models
For the top 10 A polynomial model , Train the corresponding of each polynomial on the training set θ, Then use the test set separately Calculation error , Hypothesis discovery J_test(θ^5) The minimum value of , So we will choose the fifth model

such Disadvantages of choice : The model selected at this time , It can best fit the parameter values and polynomials of the test set . therefore , Then use the same test set to evaluate the hypothetical function , It's obviously unfair , It is likely to lead to over fitting .
4.3 Correct model selection method —— Cross validation set
Cross validation set : Divide the dataset into 6:2:2 In the third part of : That is to use 60% As a training set training set、 Use 20% As a cross validation set cross validation set(cv, Or simply validation set)、 Use 20% Data as a tester test set

4.4 Specific model selection process
Use training sets to learn 10 A model
use 10 The two models calculate the cross validation set error respectively ( The value of the cost function )
On each set Error calculation formula

Select the model with the lowest cost function
Use steps 3 The extended error of the selected model is calculated for the test set ( The value of the cost function )
Find the model selected by this method , Make predictions on the test set , A more ideal generalization error can be obtained
If you think the article is good , You can give me some praise and encourage me , Welcome to collect
Pay attention to me , Let's study together , Progress together !!!
边栏推荐
- Application and principle of distributed current limiting redistribution rratelimiter
- 总结:POD与容器的区别
- Sword finger offer 55 - I. depth of binary tree
- Cause analysis of 12 MySQL slow queries
- Introduction and solution of common security vulnerabilities in Web System SQL injection
- 时间序列统计分析
- Dynamic programming problem (6)
- Attack and defense world web master advanced area web_ php_ unserialize
- Compilation principle research study topic 2 -- recursive descent syntax analysis design principle and Implementation
- NPM replace the latest Taobao image
猜你喜欢

MySql中的like和in走不走索引

Sword finger offer 41. median in data flow

Idea error running 'application' command line is too long solution

Visual full link log tracking

PTA (daily question) 7-73 turning triangle

Idea2021.2 installation and configuration (continuous update)

分布式限流 redission RRateLimiter 的使用及原理

Advanced area of attack and defense world web masters -baby Web

IDEA2021.2安装与配置(持续更新)

Dynamic programming problem (VII)
随机推荐
Dynamic programming (V)
Dynamic programming problem (VII)
ES6 operation tutorial
With this, your messages can't be monitored
递归/回溯刷题(中)
Dynamic programming problem (1)
vulnhub:SolidState
Summary: the difference between pod and container
What does WGet mean
CV target detection model sketch (2)
Install mysql5.7 under Linux, super detailed complete tutorial, and cloud MySQL connection
Multimodal model sketch (1)
Detailed explanation of the usage of exists in MySQL
Network traffic monitoring tool iftop
还在写大量 if 来判断?一个规则执行器干掉项目中所有的 if 判断...
MySQL sub database and sub table and its smooth expansion scheme
Concurrency in go
Attack and defense world web master advanced area web_ php_ include
Solutions such as failed plug-in installation and slow speed of linking remote server under vscode
vscode下链接远程服务器安装插件失败、速度慢等解决方法