当前位置:网站首页>Common optimization methods
Common optimization methods
2022-07-05 05:33:00 【Li Junfeng】
Preface
In the training of neural networks , Its essence is to find the appropriate parameters , bring loss function Minimum . However, this minimum value is too difficult to find , Because there are too many parameters . To solve this problem , At present, there are several common optimization methods .
Random gradient descent method
This is the most classic algorithm , It is also a common method . Compared with random search , This method is already excellent , But there are still some deficiencies .
shortcoming
- The value of step size has a great influence on the result : Too many steps will not converge , The step size is too small , Training time is too long , It can't even converge .
- The gradient of some functions does not point to their minimum , For example, the function is z = x 2 100 + y 2 z=\frac{x^2}{100}+y^2 z=100x2+y2, This function image is symmetric , Long and narrow “ The valley ”.
- Because part of the gradient does not point to the minimum , This will cause its path to find the optimal solution to be very tortuous , And the correspondence is relatively “ flat ” The place of , It may not be able to find the optimal solution .
AdaGrad
Because of the nature of the gradient , It is difficult to improve in some places , But the step size can also be optimized .
A common method is Step attenuation , The initial step size is relatively large , Because at this time, it is generally far from the optimal solution , You can walk more , To improve the speed of training . As the training goes on , The step size decreases , Because at this time, it is closer to the optimal solution , If the step size is too large, it may miss the optimal solution or fail to converge .
Attenuation parameters are generally related to the gradient that has been trained
h ← h + ∂ L ∂ W ⋅ ∂ L ∂ W W ← W − η 1 h ⋅ ∂ L ∂ W h\leftarrow h + \frac{\partial L}{\partial W}\cdot\frac{\partial L}{\partial W} \newline W\leftarrow W - \eta\frac{1}{\sqrt{h}}\cdot\frac{\partial L}{\partial W} h←h+∂W∂L⋅∂W∂LW←W−ηh1⋅∂W∂L
Momentum
The explanation of this word is momentum , According to the definition in Physics F ⋅ t = m ⋅ v F\cdot t = m\cdot v F⋅t=m⋅v.
In order to understand this method more vividly , Consider a surface in three-dimensional space , There is a ball on it , Need to roll to the lowest point .
For the sake of calculation , Part regards the mass of the ball as a unit 1, Then find the derivative of time for this formula : d F d t ⋅ d t = m ⋅ d v d t ⇒ d F = m ⋅ d v d t \frac{dF}{dt}\cdot dt=m\cdot\frac{dv}{dt} \Rightarrow dF=m\cdot\frac{dv}{dt} dtdF⋅dt=m⋅dtdv⇒dF=m⋅dtdv.
Consider the impact of this little ball “ force ”: The component of gravity caused by the inclination of the current position ( gradient ), Friction that blocks motion ( When there is no gradient, the velocity will attenuation ).
Then you can easily write the speed and position of the ball at the next moment : v ← α ⋅ v − ∂ F ∂ W w ← w + v v\leftarrow\alpha\cdot v - \frac{\partial F}{\partial W} \newline w\leftarrow w + v v←α⋅v−∂W∂Fw←w+v
advantage
This method can be very close to the problem that the gradient does not point to the optimal solution , Even if a gradient does not point to the optimal solution , But only it exists to the optimal solution Speed , Then it can continue to approach the optimal solution .
边栏推荐
- Improvement of pointnet++
- Fried chicken nuggets and fifa22
- Software test -- 0 sequence
- Haut OJ 1221: a tired day
- 【Jailhouse 文章】Jailhouse Hypervisor
- Sword finger offer 58 - ii Rotate string left
- [merge array] 88 merge two ordered arrays
- Haut OJ 1241: League activities of class XXX
- kubeadm系列-02-kubelet的配置和启动
- 全国中职网络安全B模块之国赛题远程代码执行渗透测试 //PHPstudy的后门漏洞分析
猜你喜欢

Double pointer Foundation

CF1637E Best Pair

Improvement of pointnet++

Web APIs DOM node

Sword finger offer 58 - ii Rotate string left

Support multi-mode polymorphic gbase 8C database continuous innovation and heavy upgrade

SAP method of modifying system table data
![[to be continued] [UE4 notes] L2 interface introduction](/img/0f/268c852b691bd7459785537f201a41.jpg)
[to be continued] [UE4 notes] L2 interface introduction

Sword finger offer 05 Replace spaces
![[merge array] 88 merge two ordered arrays](/img/e9/a73d9f22eead8e68c1e45c27ff6e6c.jpg)
[merge array] 88 merge two ordered arrays
随机推荐
CF1637E Best Pair
Light a light with stm32
2017 USP Try-outs C. Coprimes
26、 File system API (device sharing between applications; directory and file API)
Graduation project of game mall
Daily question - longest substring without repeated characters
Codeforces Round #732 (Div. 2) D. AquaMoon and Chess
Cluster script of data warehouse project
Remote upgrade afraid of cutting beard? Explain FOTA safety upgrade in detail
【Jailhouse 文章】Jailhouse Hypervisor
Pointnet++学习
Under the national teacher qualification certificate in the first half of 2022
Sword finger offer 05 Replace spaces
In this indifferent world, light crying
PC寄存器
The number of enclaves
lxml.etree.XMLSyntaxError: Opening and ending tag mismatch: meta line 6 and head, line 8, column 8
Sword finger offer 53 - ii Missing numbers from 0 to n-1
从Dijkstra的图灵奖演讲论科技创业者特点
High precision subtraction