当前位置:网站首页>15. Model evaluation and selection
15. Model evaluation and selection
2022-07-29 00:28:00 【WuJiaYFN】
primary coverage
- Method of debugging a machine learning algorithm
- Machine learning diagnostics
- Steps to evaluate hypothetical functions
- Model selection and cross validation set
One 、 Method of debugging a machine learning algorithm
- Get more training data
- Try to reduce the number of features
- Try more features
- Try adding polynomial features
- Try to reduce the degree of regularization λ
- Try increasing the degree of regularization λ

Two 、 Machine learning diagnostics
2.1 The reason for the introduction
- If we try to use the above six methods to improve our algorithm one by one , It may take many times of debugging and can't get good results , So we introduced Machine learning diagnostics
- Machine learning diagnostics It can help us know which of the above six methods are useful for our algorithm
2.2 Definition of machine learning diagnostics
- Diagnostic method It means : It's a test method , By performing this test , Be able to gain insight into whether an algorithm is useful ; You can also know what kind of attempts should be made to improve an algorithm through this test
3、 ... and 、 Steps to evaluate hypothetical functions
Evaluate whether an algorithm has gone through the fitting step :
First step : Divide the test set and training set
choice 70% As a training set , be left over 30% As a test set
If the number of data is randomly distributed , You can choose the front 70% Data as a training set ; If it is not random, it should be selected randomly 70% As a training set

The second step : Calculate the test error
After the model learns its parameters through the training set , Run the model on the test set , There are really many different problems. There are two ways to calculate the error
about linear regression model ( The return question ), Use the test set data to calculate the cost function

about Logistic regression model ( Classification problem ), In addition to using the test set data to calculate the cost function , There is also a more understandable method of defining test errors , be called Misclassification rate

Misclassification rate It's also called 0/1 Misclassification rate , That is, the rate of misclassification , The calculation method is as follows : For each test set data calculation err(hθ(x),y) Value , Then calculate the average value of the calculation results

- err(hθ(x),y) It means : If the classification prediction results hθ(x) error , be err The value is 1; If hθ(x) The prediction is correct , be err The value is 0.
- The overall test error is all err And value .
Four 、 Model selection and cross validation set
4.1 Problem introduction
Suppose we need to be in 10 Choose between quadratic models with different times :

One reason for over fitting is : Debug only on the test set θ The training error obtained , It is usually not a good estimate of the actual generalization error
4.2 The disadvantages of dividing data into training sets and test sets to select models
For the top 10 A polynomial model , Train the corresponding of each polynomial on the training set θ, Then use the test set separately Calculation error , Hypothesis discovery J_test(θ^5) The minimum value of , So we will choose the fifth model

such Disadvantages of choice : The model selected at this time , It can best fit the parameter values and polynomials of the test set . therefore , Then use the same test set to evaluate the hypothetical function , It's obviously unfair , It is likely to lead to over fitting .
4.3 Correct model selection method —— Cross validation set
Cross validation set : Divide the dataset into 6:2:2 In the third part of : That is to use 60% As a training set training set、 Use 20% As a cross validation set cross validation set(cv, Or simply validation set)、 Use 20% Data as a tester test set

4.4 Specific model selection process
Use training sets to learn 10 A model
use 10 The two models calculate the cross validation set error respectively ( The value of the cost function )
On each set Error calculation formula

Select the model with the lowest cost function
Use steps 3 The extended error of the selected model is calculated for the test set ( The value of the cost function )
Find the model selected by this method , Make predictions on the test set , A more ideal generalization error can be obtained
If you think the article is good , You can give me some praise and encourage me , Welcome to collect
Pay attention to me , Let's study together , Progress together !!!
边栏推荐
- Advanced area of attack and defense world web masters -baby Web
- #{}和${}的区别
- [ESN] learning echo state network
- MySQL事务(transaction) (有这篇就足够了..)
- Attack and defense world web master advanced area php2
- Sword finger offer 55 - I. depth of binary tree
- Samsung asset management (Hong Kong) launched yuancosmos ETF to focus on investing in the future tuyere track
- Still writing a lot of if to judge? A rule executor kills all if judgments in the project
- Html+css+php+mysql realize registration + login + change password (with complete code)
- The difference between {} and ${}
猜你喜欢

IDEA2021.2安装与配置(持续更新)

MySql中的like和in走不走索引

What does the expression > > 0 in JS mean

Dynamic programming problem (3)

Idea connection database

Geth installation

PTA (daily question) 7-77 encryption

Plato farm is expected to further expand its ecosystem through elephant swap

Intelligent trash can (VII) -- Introduction and use of sg90 steering gear (Pico implementation of raspberry pie)

Event extraction and documentation (2008-2017)
随机推荐
AutoCAD -- import excel tables into CAD and merge CAD
Dynamic programming problem (VIII)
110 MySQL interview questions and answers (continuously updated)
How to learn R language
MySQL 分库分表及其平滑扩容方案
还在写大量 if 来判断?一个规则执行器干掉项目中所有的 if 判断...
@Detailed explanation of postconstruct annotation
pnpm的安装与使用
MySQL installation and configuration tutorial (super detailed, nanny level)
面试被问到了String相关的几道题,你能答上来吗?
DCAT in laravel_ Admin preliminary use record
Do like and in indexes in MySQL go
Dynamic programming problem (6)
乱打日志的男孩运气怎么样我不知道,加班肯定很多!
聊聊异步编程的 7 种实现方式
Laravel8 middleware realizes simple permission control
Dynamic programming problem (VII)
Advanced area of attack and defense world web masters training www robots
Event extraction and documentation (2008-2017)
Attack and defense world web master advanced area php2