当前位置：网站首页>Li Hongyi machine learning (2017 Edition)_ P5: error

Li Hongyi machine learning (2017 Edition)_ P5: error

2022-07-27 01:12:00 【Although Beihai is on credit, Fuyao can take it】

Catalog

Related information
1、 Source of error
2、 Error estimation
- 2.1、 assessment x The deviation of
- 2.2、 assessment x The variance of
3、 Influencing factors
4、 Optimize processing
- 4.1、 Under fitting
- 4.2、 Over fitting
5、 Model selection

Related information

Open source content ：https://linklearner.com/datawhale-homepage/index.html#/learn/detail/13

Open source content ：https://github.com/datawhalechina/leeml-notes

Open source content ：https://gitee.com/datawhalechina/leeml-notes

Video address ：https://www.bilibili.com/video/BV1Ht411g7Ef

Official address ：http://speech.ee.ntu.edu.tw/~tlkagk/courses.html

1、 Source of error

There are two sources of prediction error , They are deviations biasbias And variance variancevariance .

2、 Error estimation

2.1、 assessment x The deviation of

hypothesis xx The average value of is $\mu$ , The variance of $\sigma^2$

First get N A sample points ： $x_1, y_1),(x_2, y_2),...,(x_n, y_n)$
Calculate average m, obtain $m=\frac{1}{N}\sum_nx_n\neq μ$
Calculate many groups of m , Then seek m The expectations of the :( Unbiased estimate （unbiased）)
$\left[ m \right] =E \left[ \frac{1}{N}\sum x^{n}\right] = \frac{1}{N}\sum _{n}E \left[ x^{n}\right] =\mu$

2.2、 assessment x The variance of

mm Distribution for $\mu$ The degree of dispersion of （ variance ） Depending on N, N The smaller, the more discrete
$\left[ m \right] = \frac{\sigma ^{2}}{N}$
Insert picture description here
The variance is an approximate estimate .

3、 Influencing factors

3.1、 Different data sets

Use the same model, Found in different training sets $f^∗$ It's just different , Different data sets have a great impact on model training .

3.2、 Different models

3.2.1、 Consider the variance of different models

Once the variance of the model is relatively small , In other words, it is more concentrated , Less dispersion . and 5 The variance of the sub model is relatively large , Similarly, it is widely spread , Large degree of dispersion .

So use a simpler model , The variance is relatively small . If you use a complex model , The variance is very large , Spread more widely .

This is also because the simple model is less affected by different training sets .

3.2.1、 Consider the deviation of different models

The deviation of the primary model is relatively large , And complex. 5 Sub model , The deviation is relatively small .

Intuitive explanation ： Simple model function set space The relatively small , So maybe space There is no bull's-eye in it , Definitely not . And the complex model function set space The larger , May contain a bull's-eye , There's just no way to find out exactly where the bull's-eye is , But enough , You can get real $\hat{f}$ .

4、 Optimize processing

Simple model is the error caused by large deviation , This situation is called under fitting , And complex models （ It is the error caused by too large variance , This situation is called over fitting .
If the model does not have a good training set , It's just that the deviation is too large , That is, under fitting If the model is a good training set , That is, get a small error in the retraining set , But I got a big mistake on the test set , This means that the model may have a large variance , It's over fitting . For under fitting and over fitting , It is handled in different ways .

4.1、 Under fitting

At this point, the model should be redesigned . Because the previous function set may not contain $f^*$ . Sure ：

Add more functions , For example, consider height and weight , perhaps HP Value and so on .
Or consider more powers 、 More complex models .
If you force to collect more data To train , It doesn't help , Because the designed function set itself is not good , It won't be better to find more training sets .

4.2、 Over fitting

Simple and crude method ： More data
Adjust the data set according to the understanding of the problem ： Data to enhance

5、 Model selection

5.1、 Model difference

Now there is a trade-off between bias and variance The model you want to choose , It can balance the errors caused by deviation and variance , Minimize total errors .
Insert picture description here
== Be careful ：== You cannot filter directly according to the test set after training , Because there are differences in the test set . Train different models with training sets , Then compare the errors on the test set , I think the optimal model is good . But in fact, this is just a test set in your hand , A truly complete test set does not . For example, on the existing test set, the error is 0.5, However, when more test sets are collected conditionally, the errors are usually greater than 0.5 Of .

5.2、 Cross validation

Insert picture description here
Divide the training set into two parts , Part of it is a training set , Part as validation set .
Train the model with the training set , Then compare... On the validation set , After really producing the best model , Then use all the training sets to train the optimal model , Then test it .

5.3、N- Crossover verification

Insert picture description here
Divide the training set into N Share , Like sharing 3 Share . For example, in three training results Average The error is the model 1 best , Then use all the training sets to train the model 1.