当前位置:网站首页>Li Hongyi machine learning (2017 Edition)_ P5: error
Li Hongyi machine learning (2017 Edition)_ P5: error
2022-07-27 01:12:00 【Although Beihai is on credit, Fuyao can take it】
Catalog

Related information
Open source content :https://linklearner.com/datawhale-homepage/index.html#/learn/detail/13
Open source content :https://github.com/datawhalechina/leeml-notes
Open source content :https://gitee.com/datawhalechina/leeml-notes
Video address :https://www.bilibili.com/video/BV1Ht411g7Ef
Official address :http://speech.ee.ntu.edu.tw/~tlkagk/courses.html
1、 Source of error
There are two sources of prediction error , They are deviations biasbias And variance variancevariance .
2、 Error estimation
2.1、 assessment x The deviation of
hypothesis xx The average value of is μ \mu μ, The variance of σ 2 \sigma^2 σ2
- First get N A sample points : ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x n , y n ) (x_1, y_1),(x_2, y_2),...,(x_n, y_n) (x1,y1),(x2,y2),...,(xn,yn)
- Calculate average m, obtain m = 1 N ∑ n x n ≠ μ m=\frac{1}{N}\sum_nx_n\neq μ m=N1∑nxn=μ
- Calculate many groups of m , Then seek m The expectations of the :( Unbiased estimate (unbiased))
E [ m ] = E [ 1 N ∑ x n ] = 1 N ∑ n E [ x n ] = μ E \left[ m \right] =E \left[ \frac{1}{N}\sum x^{n}\right] = \frac{1}{N}\sum _{n}E \left[ x^{n}\right] =\mu E[m]=E[N1∑xn]=N1n∑E[xn]=μ
2.2、 assessment x The variance of
mm Distribution for μ \mu μ The degree of dispersion of ( variance ) Depending on N, N The smaller, the more discrete
V a r [ m ] = σ 2 N Var \left[ m \right] = \frac{\sigma ^{2}}{N} Var[m]=Nσ2
The variance is an approximate estimate .
3、 Influencing factors
3.1、 Different data sets
Use the same model, Found in different training sets f ∗ f^∗ f∗ It's just different , Different data sets have a great impact on model training .
3.2、 Different models
3.2.1、 Consider the variance of different models
Once the variance of the model is relatively small , In other words, it is more concentrated , Less dispersion . and 5 The variance of the sub model is relatively large , Similarly, it is widely spread , Large degree of dispersion .
So use a simpler model , The variance is relatively small . If you use a complex model , The variance is very large , Spread more widely .
This is also because the simple model is less affected by different training sets .
3.2.1、 Consider the deviation of different models
The deviation of the primary model is relatively large , And complex. 5 Sub model , The deviation is relatively small .
Intuitive explanation : Simple model function set space The relatively small , So maybe space There is no bull's-eye in it , Definitely not . And the complex model function set space The larger , May contain a bull's-eye , There's just no way to find out exactly where the bull's-eye is , But enough , You can get real f ^ \hat{f} f^.
4、 Optimize processing
Simple model is the error caused by large deviation , This situation is called under fitting , And complex models ( It is the error caused by too large variance , This situation is called over fitting .
If the model does not have a good training set , It's just that the deviation is too large , That is, under fitting If the model is a good training set , That is, get a small error in the retraining set , But I got a big mistake on the test set , This means that the model may have a large variance , It's over fitting . For under fitting and over fitting , It is handled in different ways .
4.1、 Under fitting
At this point, the model should be redesigned . Because the previous function set may not contain f ∗ f^* f∗. Sure :
- Add more functions , For example, consider height and weight , perhaps HP Value and so on .
- Or consider more powers 、 More complex models .
- If you force to collect more data To train , It doesn't help , Because the designed function set itself is not good , It won't be better to find more training sets .
4.2、 Over fitting
- Simple and crude method : More data
- Adjust the data set according to the understanding of the problem : Data to enhance
5、 Model selection
5.1、 Model difference
Now there is a trade-off between bias and variance The model you want to choose , It can balance the errors caused by deviation and variance , Minimize total errors .
== Be careful :== You cannot filter directly according to the test set after training , Because there are differences in the test set . Train different models with training sets , Then compare the errors on the test set , I think the optimal model is good . But in fact, this is just a test set in your hand , A truly complete test set does not . For example, on the existing test set, the error is 0.5, However, when more test sets are collected conditionally, the errors are usually greater than 0.5 Of .
5.2、 Cross validation

Divide the training set into two parts , Part of it is a training set , Part as validation set .
Train the model with the training set , Then compare... On the validation set , After really producing the best model , Then use all the training sets to train the optimal model , Then test it .
5.3、N- Crossover verification

Divide the training set into N Share , Like sharing 3 Share . For example, in three training results Average The error is the model 1 best , Then use all the training sets to train the model 1.
边栏推荐
- VSCode2015下编译darknet生成darknet.ext时error MSB3721:XXX已退出,返回代码为 1。
- Reasons why row locks in MySQL upgrade table locks
- SQL learning (1) - table related operations
- What are the necessary functions of short video app development?
- SQL学习(1)——表相关操作
- Real time calculation demo based on Flink: user behavior analysis (IV: how many different users have visited the website (UV) in a period of time)
- 基于485总线的评分系统
- Write the changed data in MySQL to Kafka through flinkcdc (datastream mode)
- MySQL Part 2
- DataNode Decommision
猜你喜欢

Understanding of Flink interval join source code

Hidden index and descending index in MySQL 8.0 (new feature)

Real time calculation demo based on Flink: user behavior analysis (IV: how many different users have visited the website (UV) in a period of time)

Data warehouse knowledge points

Naive Bayes Multiclass训练模型

Use and cases of partitions

Flinksql multi table (three table) join/interval join

不止直播:腾讯云直播MLVB 插件除了推流/拉流还有哪些亮眼功能

Tencent upgrades the live broadcast function of video Number applet. Tencent's foundation for continuous promotion of live broadcast is this technology called visual cube (mlvb)

智密-腾讯云直播 MLVB 插件优化教程:六步提升拉流速度+降低直播延迟
随机推荐
adb. Exe stopped working popup problem
Rational selection of (Spark Tuning ~) operator
深度学习汇报(2)
Status management in Flink
One of the Flink requirements - sideoutput (Application of side output flow: output the temperature higher than 30 ℃ to the mainstream, and output the temperature lower than 30 ℃ to the side flow)
Verilog过程赋值语句
flink需求之—SideOutPut(侧输出流的应用:将温度大于30℃的输出到主流,低于30℃的输出到侧流)
Channel shutdown: channel error; protocol method: #method<channel. close>(reply-code=406, reply-text=
Flink 1.15 implements SQL script to recover data from savepointh
浅析ContentValues
Write the changed data in MySQL to Kafka through flinkcdc (datastream mode)
无重复字符的最长子串
Flink1.11 SQL local run demo & local webui visual solution
Li Hongyi machine learning (2017 Edition)_ P21: convolutional neural network CNN
Flink based real-time computing Demo - Data Analysis on user behavior
Flink中的状态管理
Uni-app开发App和插件以后如何开通广告盈利:uni-AD
Iptables 详解与实战案例
Data warehouse knowledge points
The difference between forward and redirect