当前位置:网站首页>Li Hongyi machine learning (2017 Edition)_ P5: error
Li Hongyi machine learning (2017 Edition)_ P5: error
2022-07-27 01:12:00 【Although Beihai is on credit, Fuyao can take it】
Catalog

Related information
Open source content :https://linklearner.com/datawhale-homepage/index.html#/learn/detail/13
Open source content :https://github.com/datawhalechina/leeml-notes
Open source content :https://gitee.com/datawhalechina/leeml-notes
Video address :https://www.bilibili.com/video/BV1Ht411g7Ef
Official address :http://speech.ee.ntu.edu.tw/~tlkagk/courses.html
1、 Source of error
There are two sources of prediction error , They are deviations biasbias And variance variancevariance .
2、 Error estimation
2.1、 assessment x The deviation of
hypothesis xx The average value of is μ \mu μ, The variance of σ 2 \sigma^2 σ2
- First get N A sample points : ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x n , y n ) (x_1, y_1),(x_2, y_2),...,(x_n, y_n) (x1,y1),(x2,y2),...,(xn,yn)
- Calculate average m, obtain m = 1 N ∑ n x n ≠ μ m=\frac{1}{N}\sum_nx_n\neq μ m=N1∑nxn=μ
- Calculate many groups of m , Then seek m The expectations of the :( Unbiased estimate (unbiased))
E [ m ] = E [ 1 N ∑ x n ] = 1 N ∑ n E [ x n ] = μ E \left[ m \right] =E \left[ \frac{1}{N}\sum x^{n}\right] = \frac{1}{N}\sum _{n}E \left[ x^{n}\right] =\mu E[m]=E[N1∑xn]=N1n∑E[xn]=μ
2.2、 assessment x The variance of
mm Distribution for μ \mu μ The degree of dispersion of ( variance ) Depending on N, N The smaller, the more discrete
V a r [ m ] = σ 2 N Var \left[ m \right] = \frac{\sigma ^{2}}{N} Var[m]=Nσ2
The variance is an approximate estimate .
3、 Influencing factors
3.1、 Different data sets
Use the same model, Found in different training sets f ∗ f^∗ f∗ It's just different , Different data sets have a great impact on model training .
3.2、 Different models
3.2.1、 Consider the variance of different models
Once the variance of the model is relatively small , In other words, it is more concentrated , Less dispersion . and 5 The variance of the sub model is relatively large , Similarly, it is widely spread , Large degree of dispersion .
So use a simpler model , The variance is relatively small . If you use a complex model , The variance is very large , Spread more widely .
This is also because the simple model is less affected by different training sets .
3.2.1、 Consider the deviation of different models
The deviation of the primary model is relatively large , And complex. 5 Sub model , The deviation is relatively small .
Intuitive explanation : Simple model function set space The relatively small , So maybe space There is no bull's-eye in it , Definitely not . And the complex model function set space The larger , May contain a bull's-eye , There's just no way to find out exactly where the bull's-eye is , But enough , You can get real f ^ \hat{f} f^.
4、 Optimize processing
Simple model is the error caused by large deviation , This situation is called under fitting , And complex models ( It is the error caused by too large variance , This situation is called over fitting .
If the model does not have a good training set , It's just that the deviation is too large , That is, under fitting If the model is a good training set , That is, get a small error in the retraining set , But I got a big mistake on the test set , This means that the model may have a large variance , It's over fitting . For under fitting and over fitting , It is handled in different ways .
4.1、 Under fitting
At this point, the model should be redesigned . Because the previous function set may not contain f ∗ f^* f∗. Sure :
- Add more functions , For example, consider height and weight , perhaps HP Value and so on .
- Or consider more powers 、 More complex models .
- If you force to collect more data To train , It doesn't help , Because the designed function set itself is not good , It won't be better to find more training sets .
4.2、 Over fitting
- Simple and crude method : More data
- Adjust the data set according to the understanding of the problem : Data to enhance
5、 Model selection
5.1、 Model difference
Now there is a trade-off between bias and variance The model you want to choose , It can balance the errors caused by deviation and variance , Minimize total errors .
== Be careful :== You cannot filter directly according to the test set after training , Because there are differences in the test set . Train different models with training sets , Then compare the errors on the test set , I think the optimal model is good . But in fact, this is just a test set in your hand , A truly complete test set does not . For example, on the existing test set, the error is 0.5, However, when more test sets are collected conditionally, the errors are usually greater than 0.5 Of .
5.2、 Cross validation

Divide the training set into two parts , Part of it is a training set , Part as validation set .
Train the model with the training set , Then compare... On the validation set , After really producing the best model , Then use all the training sets to train the optimal model , Then test it .
5.3、N- Crossover verification

Divide the training set into N Share , Like sharing 3 Share . For example, in three training results Average The error is the model 1 best , Then use all the training sets to train the model 1.
边栏推荐
- MySQL index optimization: under what circumstances do you need to build an index (several situations suitable for building an index)
- 基于Flink实时计算Demo—关于用户行为的数据分析
- Contextcompat. Checkselfpermission() method
- Flink1.11 multi parallelism watermark test
- What is kubernetes?
- 智密-腾讯云直播 MLVB 插件优化教程:六步提升拉流速度+降低直播延迟
- 深度学习汇报(1)
- 数据库期中(一)
- MySQL - how to determine a field suitable for building an index?
- One of the Flink requirements - sideoutput (Application of side output flow: output the temperature higher than 30 ℃ to the mainstream, and output the temperature lower than 30 ℃ to the side flow)
猜你喜欢

Redis -- cache avalanche, cache penetration, cache breakdown

What is kubernetes?

Kubernetes 是什么 ?

Scala-模式匹配

Compile Darknet under vscode2015 to generate darknet Ext error msb3721: XXX has exited with a return code of 1.

Wu Enda's in-depth learning series teaching video learning notes (I) -- logistic regression function for binary classification

腾讯升级视频号小程序直播功能,腾讯持续推广直播的底气是这项叫视立方(MLVB)的技术

Spark on yarn's job submission process

One of the Flink requirements - sideoutput (Application of side output flow: output the temperature higher than 30 ℃ to the mainstream, and output the temperature lower than 30 ℃ to the side flow)

腾讯云直播插件MLVB如何借助这些优势成为主播直播推拉流的神助攻?
随机推荐
DataNode Decommision
Analysis of contentvalues
李宏毅机器学习(2017版)_P5:误差
Android -- Data Persistence Technology (III) database storage
Hidden index and descending index in MySQL 8.0 (new feature)
Understanding of Flink checkpoint source code
Simple explanation of database table connection
The basic concept of how Tencent cloud mlvb technology can highlight the siege in mobile live broadcasting services
解决rsyslog服务占用内存过高
Flink1.11 write MySQL test cases in jdcb mode
Write the changed data in MySQL to Kafka through flinkcdc (datastream mode)
深度学习汇报(1)
Flink1.11 multi parallelism watermark test
Uni-app开发App和插件以后如何开通广告盈利:uni-AD
Verilog过程赋值语句
网站日志采集和分析流程
VSCode2015下编译darknet生成darknet.ext时error MSB3721:XXX已退出,返回代码为 1。
Spark----- RDD 的 Shuffle 和分区
SparkSql之编程方式
Android——数据持久化技术(三) 数据库存储