当前位置:网站首页>04 automatic learning rate - learning notes - lihongyi's in-depth learning 2021

04 automatic learning rate - learning notes - lihongyi's in-depth learning 2021

2022-06-11 23:16:00 iioSnail

Last one :03 gradient (Gradient) What if it's small (Local Minima And Saddle Point)- Learning notes - Li Hongyi studies deeply 2021 year

Next :05 Classification- Learning notes - Li Hongyi studies deeply 2021 year

Contents of this section and related links

Automatic adjustment Learning Rate Common strategies for

Class notes

When training When stuck in a bottleneck , Is not necessarily gradient Too small , It may be due to Learning rate is too high , Cause it to vibrate between valleys , The minimum value cannot be reached

 Insert picture description here
Corresponding to gradient The function image of is shown in the following figure :
 Insert picture description here
x x x Axis is the number of updates , y y y by gradient Size


According to the number of iterations , Current gradient and other factors , Automatic adjustment Learning Rate. θ \theta θ The updated formula of is changed to : θ i t + 1 ← θ i t − η σ i t g i t \theta_i^{t+1}\leftarrow \theta_i^t - \frac{\eta}{\sigma_i^t}g^t_i θit+1θitσitηgit

about Learning Rate Adjustment of , All through adjustment σ \sigma σ To achieve

Common adjustment strategies are :

  • Root Mean Square: Consider this gradient and all gradients in the past
  • RMSProp: Focus on this gradient , Think a little about all the gradients in the past
  • Adam: Combined with the RMSProp and Momentum
  • Learning Rate Decay: As the number of updates increases , Because the closer we get to our goal , So will Learning Rate The small
  • Warm Up: In limine Learning Rate Smaller one , Then it increases as the number of iterations increases , And then at some point , And then it decreases with the increase of the number of iterations . As shown in the figure : Insert picture description here

Root Mean Square Formula for : σ i t = 1 t + 1 ∑ i = 0 t ( g i t ) 2 \sigma_{i}^{t}=\sqrt{\frac{1}{t+1} \sum_{i=0}^{t}\left(g_{i}^{t}\right)^{2}} σit=t+11i=0t(git)2


RMSProp Formula for : σ i t = α ( σ i t − 1 ) 2 + ( 1 − α ) ( g i t ) 2 \sigma_{i}^{t}=\sqrt{\alpha\left(\sigma_{i}^{t-1}\right)^{2}+(1-\alpha)\left(g_{i}^{t}\right)^{2}} σit=α(σit1)2+(1α)(git)2 among α \alpha α For the super parameter to be adjusted , 0 < α < 1 0<\alpha<1 0<α<1


Adam The proposal USES Pytorch Default parameters .

Adam The adjustment strategy of is as follows :

 Insert picture description here


原网站

版权声明
本文为[iioSnail]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/03/202203011620128411.html