当前位置：网站首页>Over fitting and regularization

Over fitting and regularization

2022-07-05 05:33:00 【Li Junfeng】

Over fitting

This is a neural network training process , The problems we often encounter , Simply speaking , Is the performance of the model , Learning ability is too strong , So that Training set All the details of have been recorded . When you meet Test set , It's when I haven't seen data before , There will be obvious mistakes .

The reasons causing

The most essential reason is ： Too many parameters （ The model is too complex ）
Other reasons are ：

The distribution of test set and training set is different
The number of training sets is too small

terms of settlement

For the above reasons , Several countermeasures can be put forward

Reduce model complexity , Regularization is commonly used .
Enhanced training set

norm Norm（Minkowski distance ）

Definition

A norm is a function , It gives each vector in a vector space a length or size .

For the zero vector , The length of 0.
$\lVert x \rVert_p = \left(\displaystyle\sum_{i=1}^n \lvert x\rvert^p\right)^{\frac{1}{p}}$

Properties of norms

Nonnegativity $\lVert x\rVert \ge 0$
Homogeneity $\lVert cx\rVert=\lvert c\rvert \lVert x\rVert$
Trigonometric inequality $\lVert x + y\rVert \leq \lVert x\rVert +\Vert y\rVert$

Norm characteristic

$L_0$ norm ： Number of non-zero elements
$L_1$ norm ： The sum of absolute values
$L_2$ norm ： Euler distance
$L_{\infin}$ norm ： The absolute value Maximum The absolute value of the element of

Regularization

Objective function plus a norm , As a penalty . If a parameter is larger , It will increase the norm , That is, the penalty item increases . So under the action of norm , Many parameters are getting smaller .
The smaller the parameter , It shows that it plays a smaller role in neural network , That is, the smaller the impact on the final result , Therefore, it can make the model simpler , And it has more generalization ability .

Regularization is also a Superior bad discard Thought , Although many parameters are useful for the model , But in the end, only important parameters can be preserved （ It's worth more , Have a great impact on the results ）, And most parameters have been eliminated （ Small values , It has little effect on the results ）.

原网站

版权声明
本文为[Li Junfeng]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/186/202207050527288321.html