当前位置:网站首页>Common optimization methods
Common optimization methods
2022-07-05 05:33:00 【Li Junfeng】
Preface
In the training of neural networks , Its essence is to find the appropriate parameters , bring loss function Minimum . However, this minimum value is too difficult to find , Because there are too many parameters . To solve this problem , At present, there are several common optimization methods .
Random gradient descent method
This is the most classic algorithm , It is also a common method . Compared with random search , This method is already excellent , But there are still some deficiencies .
shortcoming
- The value of step size has a great influence on the result : Too many steps will not converge , The step size is too small , Training time is too long , It can't even converge .
- The gradient of some functions does not point to their minimum , For example, the function is z = x 2 100 + y 2 z=\frac{x^2}{100}+y^2 z=100x2+y2, This function image is symmetric , Long and narrow “ The valley ”.
- Because part of the gradient does not point to the minimum , This will cause its path to find the optimal solution to be very tortuous , And the correspondence is relatively “ flat ” The place of , It may not be able to find the optimal solution .
AdaGrad
Because of the nature of the gradient , It is difficult to improve in some places , But the step size can also be optimized .
A common method is Step attenuation , The initial step size is relatively large , Because at this time, it is generally far from the optimal solution , You can walk more , To improve the speed of training . As the training goes on , The step size decreases , Because at this time, it is closer to the optimal solution , If the step size is too large, it may miss the optimal solution or fail to converge .
Attenuation parameters are generally related to the gradient that has been trained
h ← h + ∂ L ∂ W ⋅ ∂ L ∂ W W ← W − η 1 h ⋅ ∂ L ∂ W h\leftarrow h + \frac{\partial L}{\partial W}\cdot\frac{\partial L}{\partial W} \newline W\leftarrow W - \eta\frac{1}{\sqrt{h}}\cdot\frac{\partial L}{\partial W} h←h+∂W∂L⋅∂W∂LW←W−ηh1⋅∂W∂L
Momentum
The explanation of this word is momentum , According to the definition in Physics F ⋅ t = m ⋅ v F\cdot t = m\cdot v F⋅t=m⋅v.
In order to understand this method more vividly , Consider a surface in three-dimensional space , There is a ball on it , Need to roll to the lowest point .
For the sake of calculation , Part regards the mass of the ball as a unit 1, Then find the derivative of time for this formula : d F d t ⋅ d t = m ⋅ d v d t ⇒ d F = m ⋅ d v d t \frac{dF}{dt}\cdot dt=m\cdot\frac{dv}{dt} \Rightarrow dF=m\cdot\frac{dv}{dt} dtdF⋅dt=m⋅dtdv⇒dF=m⋅dtdv.
Consider the impact of this little ball “ force ”: The component of gravity caused by the inclination of the current position ( gradient ), Friction that blocks motion ( When there is no gradient, the velocity will attenuation ).
Then you can easily write the speed and position of the ball at the next moment : v ← α ⋅ v − ∂ F ∂ W w ← w + v v\leftarrow\alpha\cdot v - \frac{\partial F}{\partial W} \newline w\leftarrow w + v v←α⋅v−∂W∂Fw←w+v
advantage
This method can be very close to the problem that the gradient does not point to the optimal solution , Even if a gradient does not point to the optimal solution , But only it exists to the optimal solution Speed , Then it can continue to approach the optimal solution .
边栏推荐
- How can the Solon framework easily obtain the response time of each request?
- lxml.etree.XMLSyntaxError: Opening and ending tag mismatch: meta line 6 and head, line 8, column 8
- [allocation problem] 135 Distribute candy
- Acwing 4300. Two operations
- Reader writer model
- Haut OJ 1241: League activities of class XXX
- Reflection summary of Haut OJ freshmen on Wednesday
- YOLOv5添加注意力机制
- EOJ 2021.10 E. XOR tree
- kubeadm系列-02-kubelet的配置和启动
猜你喜欢
Remote upgrade afraid of cutting beard? Explain FOTA safety upgrade in detail
Binary search basis
Gbase database helps the development of digital finance in the Bay Area
【实战技能】如何做好技术培训?
Sword finger offer 06 Print linked list from beginning to end
Introduction to tools in TF-A
挂起等待锁 vs 自旋锁(两者的使用场合)
SAP method of modifying system table data
Educational Codeforces Round 116 (Rated for Div. 2) E. Arena
[to be continued] [UE4 notes] L2 interface introduction
随机推荐
搭建完数据库和网站后.打开app测试时候显示服务器正在维护.
Haut OJ 1221: a tired day
Sword finger offer 53 - I. find the number I in the sorted array
Software test -- 0 sequence
Light a light with stm32
Sword finger offer 58 - ii Rotate string left
【Jailhouse 文章】Jailhouse Hypervisor
Palindrome (csp-s-2021-palin) solution
Alu logic operation unit
Introduction to tools in TF-A
[allocation problem] 455 Distribute cookies
Haut OJ 1350: choice sends candy
On-off and on-off of quality system construction
CF1637E Best Pair
过拟合与正则化
PC寄存器
Zheng Qing 21 ACM is fun. (3) part of the problem solution and summary
Haut OJ 1347: addition of choice -- high progress addition
Haut OJ 1241: League activities of class XXX
Demonstration of using Solon auth authentication framework (simpler authentication framework)