当前位置：网站首页>Warmup preheating learning rate "suggestions collection"

Warmup preheating learning rate "suggestions collection"

2022-06-30 23:10:00 【Full stack programmer webmaster】

Hello everyone , I meet you again , I'm your friend, Quan Jun .

Study Rate is one of the most important super parameters in neural network training , There are many ways to optimize the learning rate ,Warmup It's one of them ( One )、 What is? Warmup? Warmup Is in ResNet A method of preheating learning rate mentioned in the paper , It chooses to use a smaller learning rate at the beginning of training , Trained some epoches perhaps steps( such as 4 individual epoches,10000steps), Then modify it to the preset learning for training .

( Two )、 Why use Warmup? Because at the beginning of training , Model weight (weights) It's randomly initialized , At this time, if you choose a larger learning rate , It may lead to the instability of the model ( Oscillate ), choice Warmup The way to warm up the learning rate , A few that can make you start training epoches Or some steps The internal learning rate is small , Under the preheating primary school attendance rate , The model can gradually become stable , After the model is relatively stable, select the preset learning rate for training , It makes the convergence speed of the model faster , The effect of the model is better .

E x a m p l e Example Example：Resnet The paper uses a 110 Layer of ResNet stay cifar10 In training , First use 0.01 The learning rate is trained until the training error is less than 80%( Probably trained 400 individual steps), And then use 0.1 The learning rate of training .

( 3、 ... and )、Warmup Improvement ( Two ) in question Warmup yes constant warmup, Its disadvantage is that changing from a small learning rate to a large learning rate may lead to a sudden increase in training error . therefore 18 year Facebook Put forward gradual warmup To solve this problem , That is, from the initial primary school attendance rate , Every step A little bit bigger , Until the relatively large learning rate originally set is reached , Use the learning rate initially set for training .

1.gradual warmup The implementation simulation code is as follows :

"""
Implements gradual warmup, if train_steps < warmup_steps, the
learning rate will be `train_steps/warmup_steps * init_lr`.
Args:
    warmup_steps:warmup Step threshold , namely train_steps<warmup_steps, Use warm-up learning rate , Otherwise, use the preset learning rate 
    train_steps: Number of steps trained 
    init_lr: Preset learning rate 
"""
import numpy as np
warmup_steps = 2500
init_lr = 0.1  
#  Simulation training 15000 Step 
max_steps = 15000
for train_steps in range(max_steps):
    if warmup_steps and train_steps < warmup_steps:
        warmup_percent_done = train_steps / warmup_steps
        warmup_learning_rate = init_lr * warmup_percent_done  #gradual warmup_lr
        learning_rate = warmup_learning_rate
    else:
        #learning_rate = np.sin(learning_rate)  # After warming up the learning rate , The learning rate is sin attenuation 
        learning_rate = learning_rate**1.0001 # After warming up the learning rate , The learning rate decays exponentially ( Approximate simulation of exponential decay )
    if (train_steps+1) % 100 == 0:
             print("train_steps:%.3f--warmup_steps:%.3f--learning_rate:%.3f" % (
                 train_steps+1,warmup_steps,learning_rate))

2. The above code to achieve Warmup Warm up the learning rate and decay after the learning rate is warmed up (sin or exp decay) The graph of is as follows :

( Four ) summary Use Warmup The way to warm up the learning rate , That is, first use the initial primary school practice rate to train , Then each step A little bit bigger , Until the relatively large learning rate originally set is reached （ notes ： At this time, the warm-up learning rate is completed ）, Use the learning rate initially set for training （ notes ： Warm up the training process after completing the learning rate , Learning rate is decaying ）, It helps to make the convergence speed of the model faster , The effect is better. .

Publisher ： Full stack programmer stack length , Reprint please indicate the source ：https://javaforall.cn/132203.html Link to the original text ：https://javaforall.cn

原网站

版权声明
本文为[Full stack programmer webmaster]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/181/202206301858240445.html

当前位置：网站首页>Warmup preheating learning rate "suggestions collection"

Warmup preheating learning rate "suggestions collection"

边栏推荐

猜你喜欢

随机推荐