当前位置:网站首页>Common optimization methods
Common optimization methods
2022-07-05 05:33:00 【Li Junfeng】
Preface
In the training of neural networks , Its essence is to find the appropriate parameters , bring loss function Minimum . However, this minimum value is too difficult to find , Because there are too many parameters . To solve this problem , At present, there are several common optimization methods .
Random gradient descent method
This is the most classic algorithm , It is also a common method . Compared with random search , This method is already excellent , But there are still some deficiencies .
shortcoming
- The value of step size has a great influence on the result : Too many steps will not converge , The step size is too small , Training time is too long , It can't even converge .
- The gradient of some functions does not point to their minimum , For example, the function is z = x 2 100 + y 2 z=\frac{x^2}{100}+y^2 z=100x2+y2, This function image is symmetric , Long and narrow “ The valley ”.
- Because part of the gradient does not point to the minimum , This will cause its path to find the optimal solution to be very tortuous , And the correspondence is relatively “ flat ” The place of , It may not be able to find the optimal solution .
AdaGrad
Because of the nature of the gradient , It is difficult to improve in some places , But the step size can also be optimized .
A common method is Step attenuation , The initial step size is relatively large , Because at this time, it is generally far from the optimal solution , You can walk more , To improve the speed of training . As the training goes on , The step size decreases , Because at this time, it is closer to the optimal solution , If the step size is too large, it may miss the optimal solution or fail to converge .
Attenuation parameters are generally related to the gradient that has been trained
h ← h + ∂ L ∂ W ⋅ ∂ L ∂ W W ← W − η 1 h ⋅ ∂ L ∂ W h\leftarrow h + \frac{\partial L}{\partial W}\cdot\frac{\partial L}{\partial W} \newline W\leftarrow W - \eta\frac{1}{\sqrt{h}}\cdot\frac{\partial L}{\partial W} h←h+∂W∂L⋅∂W∂LW←W−ηh1⋅∂W∂L
Momentum
The explanation of this word is momentum , According to the definition in Physics F ⋅ t = m ⋅ v F\cdot t = m\cdot v F⋅t=m⋅v.
In order to understand this method more vividly , Consider a surface in three-dimensional space , There is a ball on it , Need to roll to the lowest point .
For the sake of calculation , Part regards the mass of the ball as a unit 1, Then find the derivative of time for this formula : d F d t ⋅ d t = m ⋅ d v d t ⇒ d F = m ⋅ d v d t \frac{dF}{dt}\cdot dt=m\cdot\frac{dv}{dt} \Rightarrow dF=m\cdot\frac{dv}{dt} dtdF⋅dt=m⋅dtdv⇒dF=m⋅dtdv.
Consider the impact of this little ball “ force ”: The component of gravity caused by the inclination of the current position ( gradient ), Friction that blocks motion ( When there is no gradient, the velocity will attenuation ).
Then you can easily write the speed and position of the ball at the next moment : v ← α ⋅ v − ∂ F ∂ W w ← w + v v\leftarrow\alpha\cdot v - \frac{\partial F}{\partial W} \newline w\leftarrow w + v v←α⋅v−∂W∂Fw←w+v
advantage
This method can be very close to the problem that the gradient does not point to the optimal solution , Even if a gradient does not point to the optimal solution , But only it exists to the optimal solution Speed , Then it can continue to approach the optimal solution .
边栏推荐
- 二十六、文件系统API(设备在应用间的共享;目录和文件API)
- Detailed explanation of expression (csp-j 2021 expr) topic
- 第六章 数据流建模—课后习题
- Fried chicken nuggets and fifa22
- Sword finger offer 05 Replace spaces
- [binary search] 34 Find the first and last positions of elements in a sorted array
- Yolov5 adds attention mechanism
- Configuration and startup of kubedm series-02-kubelet
- Haut OJ 1352: string of choice
- 搭建完数据库和网站后.打开app测试时候显示服务器正在维护.
猜你喜欢
CF1634 F. Fibonacci Additions
A new micro ORM open source framework
[to be continued] [depth first search] 547 Number of provinces
Fragment addition failed error lookup
Solution to the palindrome string (Luogu p5041 haoi2009)
On-off and on-off of quality system construction
Service fusing hystrix
The present is a gift from heaven -- a film review of the journey of the soul
Fried chicken nuggets and fifa22
In this indifferent world, light crying
随机推荐
Introduction to memory layout of FVP and Juno platforms
CF1634E Fair Share
[speed pointer] 142 circular linked list II
Haut OJ 1352: string of choice
How can the Solon framework easily obtain the response time of each request?
Pointnet++ learning
object serialization
Detailed explanation of expression (csp-j 2021 expr) topic
Acwing 4300. Two operations
In this indifferent world, light crying
26、 File system API (device sharing between applications; directory and file API)
[to be continued] [depth first search] 547 Number of provinces
Sword finger offer 53 - I. find the number I in the sorted array
Developing desktop applications with electron
Romance of programmers on Valentine's Day
Yolov5 ajouter un mécanisme d'attention
剑指 Offer 04. 二维数组中的查找
Add level control and logger level control of Solon logging plug-in
Sword finger offer 58 - ii Rotate string left
MySQL数据库(一)