当前位置:网站首页>Over fitting and regularization

Over fitting and regularization

2022-07-05 05:33:00 Li Junfeng

Over fitting

This is a neural network training process , The problems we often encounter , Simply speaking , Is the performance of the model , Learning ability is too strong , So that Training set All the details of have been recorded . When you meet Test set , It's when I haven't seen data before , There will be obvious mistakes .

The reasons causing

The most essential reason is : Too many parameters ( The model is too complex )
Other reasons are :

  1. The distribution of test set and training set is different
  2. The number of training sets is too small

terms of settlement

For the above reasons , Several countermeasures can be put forward

  1. Reduce model complexity , Regularization is commonly used .
  2. Enhanced training set

norm Norm(Minkowski distance )

Definition

A norm is a function , It gives each vector in a vector space a length or size .

For the zero vector , The length of 0.
∥ x ∥ p = ( ∑ i = 1 n ∣ x ∣ p ) 1 p \lVert x \rVert_p = \left(\displaystyle\sum_{i=1}^n \lvert x\rvert^p\right)^{\frac{1}{p}} xp=(i=1nxp)p1

Properties of norms

  • Nonnegativity ∥ x ∥ ≥ 0 \lVert x\rVert \ge 0 x0
  • Homogeneity ∥ c x ∥ = ∣ c ∣ ∥ x ∥ \lVert cx\rVert=\lvert c\rvert \lVert x\rVert cx=cx
  • Trigonometric inequality ∥ x + y ∥ ≤ ∥ x ∥ + ∥ y ∥ \lVert x + y\rVert \leq \lVert x\rVert +\Vert y\rVert x+yx+y

Norm characteristic

  • L 0 L_0 L0 norm : Number of non-zero elements
  • L 1 L_1 L1 norm : The sum of absolute values
  • L 2 L_2 L2 norm : Euler distance
  • L ∞ L_{\infin} L norm : The absolute value Maximum The absolute value of the element of

Regularization

Objective function plus a norm , As a penalty . If a parameter is larger , It will increase the norm , That is, the penalty item increases . So under the action of norm , Many parameters are getting smaller .
The smaller the parameter , It shows that it plays a smaller role in neural network , That is, the smaller the impact on the final result , Therefore, it can make the model simpler , And it has more generalization ability .

Regularization is also a Superior bad discard Thought , Although many parameters are useful for the model , But in the end, only important parameters can be preserved ( It's worth more , Have a great impact on the results ), And most parameters have been eliminated ( Small values , It has little effect on the results ).

原网站

版权声明
本文为[Li Junfeng]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/186/202207050527288321.html