当前位置:网站首页>Common optimization methods
Common optimization methods
2022-07-05 05:33:00 【Li Junfeng】
Preface
In the training of neural networks , Its essence is to find the appropriate parameters , bring loss function Minimum . However, this minimum value is too difficult to find , Because there are too many parameters . To solve this problem , At present, there are several common optimization methods .
Random gradient descent method
This is the most classic algorithm , It is also a common method . Compared with random search , This method is already excellent , But there are still some deficiencies .
shortcoming
- The value of step size has a great influence on the result : Too many steps will not converge , The step size is too small , Training time is too long , It can't even converge .
- The gradient of some functions does not point to their minimum , For example, the function is z = x 2 100 + y 2 z=\frac{x^2}{100}+y^2 z=100x2+y2, This function image is symmetric , Long and narrow “ The valley ”.
- Because part of the gradient does not point to the minimum , This will cause its path to find the optimal solution to be very tortuous , And the correspondence is relatively “ flat ” The place of , It may not be able to find the optimal solution .
AdaGrad
Because of the nature of the gradient , It is difficult to improve in some places , But the step size can also be optimized .
A common method is Step attenuation , The initial step size is relatively large , Because at this time, it is generally far from the optimal solution , You can walk more , To improve the speed of training . As the training goes on , The step size decreases , Because at this time, it is closer to the optimal solution , If the step size is too large, it may miss the optimal solution or fail to converge .
Attenuation parameters are generally related to the gradient that has been trained
h ← h + ∂ L ∂ W ⋅ ∂ L ∂ W W ← W − η 1 h ⋅ ∂ L ∂ W h\leftarrow h + \frac{\partial L}{\partial W}\cdot\frac{\partial L}{\partial W} \newline W\leftarrow W - \eta\frac{1}{\sqrt{h}}\cdot\frac{\partial L}{\partial W} h←h+∂W∂L⋅∂W∂LW←W−ηh1⋅∂W∂L
Momentum
The explanation of this word is momentum , According to the definition in Physics F ⋅ t = m ⋅ v F\cdot t = m\cdot v F⋅t=m⋅v.
In order to understand this method more vividly , Consider a surface in three-dimensional space , There is a ball on it , Need to roll to the lowest point .
For the sake of calculation , Part regards the mass of the ball as a unit 1, Then find the derivative of time for this formula : d F d t ⋅ d t = m ⋅ d v d t ⇒ d F = m ⋅ d v d t \frac{dF}{dt}\cdot dt=m\cdot\frac{dv}{dt} \Rightarrow dF=m\cdot\frac{dv}{dt} dtdF⋅dt=m⋅dtdv⇒dF=m⋅dtdv.
Consider the impact of this little ball “ force ”: The component of gravity caused by the inclination of the current position ( gradient ), Friction that blocks motion ( When there is no gradient, the velocity will attenuation ).
Then you can easily write the speed and position of the ball at the next moment : v ← α ⋅ v − ∂ F ∂ W w ← w + v v\leftarrow\alpha\cdot v - \frac{\partial F}{\partial W} \newline w\leftarrow w + v v←α⋅v−∂W∂Fw←w+v
advantage
This method can be very close to the problem that the gradient does not point to the optimal solution , Even if a gradient does not point to the optimal solution , But only it exists to the optimal solution Speed , Then it can continue to approach the optimal solution .
边栏推荐
- [to be continued] I believe that everyone has the right to choose their own way of life - written in front of the art column
- 常见的最优化方法
- High precision subtraction
- On-off and on-off of quality system construction
- YOLOv5添加注意力机制
- 2020ccpc Qinhuangdao J - Kingdom's power
- Haut OJ 1321: mode problem of choice sister
- 动漫评分数据分析与可视化 与 IT行业招聘数据分析与可视化
- Hang wait lock vs spin lock (where both are used)
- Fragment addition failed error lookup
猜你喜欢
2017 USP Try-outs C. Coprimes
YOLOv5-Shufflenetv2
A new micro ORM open source framework
SAP method of modifying system table data
Codeforces round 712 (Div. 2) d. 3-coloring (construction)
Reader writer model
Binary search basis
【Jailhouse 文章】Look Mum, no VM Exits
In this indifferent world, light crying
Palindrome (csp-s-2021-palin) solution
随机推荐
[allocation problem] 135 Distribute candy
浅谈JVM(面试常考)
Educational Codeforces Round 107 (Rated for Div. 2) E. Colorings and Dominoes
【Jailhouse 文章】Jailhouse Hypervisor
Haut OJ 1241: League activities of class XXX
Haut OJ 1316: sister choice buys candy III
Find a good teaching video for Solon framework test (Solon, lightweight application development framework)
Sword finger offer 35 Replication of complex linked list
A preliminary study of sdei - see the essence through transactions
AtCoder Grand Contest 013 E - Placing Squares
Codeforces Round #715 (Div. 2) D. Binary Literature
第六章 数据流建模—课后习题
Mysql database (I)
Warning using room database: schema export directory is not provided to the annotation processor so we cannot export
Haut OJ 1347: addition of choice -- high progress addition
Pointnet++学习
[allocation problem] 455 Distribute cookies
剑指 Offer 35.复杂链表的复制
Use of room database
2020ccpc Qinhuangdao J - Kingdom's power