当前位置:网站首页>Warmup preheating learning rate "suggestions collection"
Warmup preheating learning rate "suggestions collection"
2022-06-30 23:10:00 【Full stack programmer webmaster】
Hello everyone , I meet you again , I'm your friend, Quan Jun .
Study Rate is one of the most important super parameters in neural network training , There are many ways to optimize the learning rate ,Warmup It's one of them ( One )、 What is? Warmup? Warmup Is in ResNet A method of preheating learning rate mentioned in the paper , It chooses to use a smaller learning rate at the beginning of training , Trained some epoches perhaps steps( such as 4 individual epoches,10000steps), Then modify it to the preset learning for training .
( Two )、 Why use Warmup? Because at the beginning of training , Model weight (weights) It's randomly initialized , At this time, if you choose a larger learning rate , It may lead to the instability of the model ( Oscillate ), choice Warmup The way to warm up the learning rate , A few that can make you start training epoches Or some steps The internal learning rate is small , Under the preheating primary school attendance rate , The model can gradually become stable , After the model is relatively stable, select the preset learning rate for training , It makes the convergence speed of the model faster , The effect of the model is better .
E x a m p l e Example Example:Resnet The paper uses a 110 Layer of ResNet stay cifar10 In training , First use 0.01 The learning rate is trained until the training error is less than 80%( Probably trained 400 individual steps), And then use 0.1 The learning rate of training .
( 3、 ... and )、Warmup Improvement ( Two ) in question Warmup yes constant warmup, Its disadvantage is that changing from a small learning rate to a large learning rate may lead to a sudden increase in training error . therefore 18 year Facebook Put forward gradual warmup To solve this problem , That is, from the initial primary school attendance rate , Every step A little bit bigger , Until the relatively large learning rate originally set is reached , Use the learning rate initially set for training .
1.gradual warmup The implementation simulation code is as follows :
"""
Implements gradual warmup, if train_steps < warmup_steps, the
learning rate will be `train_steps/warmup_steps * init_lr`.
Args:
warmup_steps:warmup Step threshold , namely train_steps<warmup_steps, Use warm-up learning rate , Otherwise, use the preset learning rate
train_steps: Number of steps trained
init_lr: Preset learning rate
"""
import numpy as np
warmup_steps = 2500
init_lr = 0.1
# Simulation training 15000 Step
max_steps = 15000
for train_steps in range(max_steps):
if warmup_steps and train_steps < warmup_steps:
warmup_percent_done = train_steps / warmup_steps
warmup_learning_rate = init_lr * warmup_percent_done #gradual warmup_lr
learning_rate = warmup_learning_rate
else:
#learning_rate = np.sin(learning_rate) # After warming up the learning rate , The learning rate is sin attenuation
learning_rate = learning_rate**1.0001 # After warming up the learning rate , The learning rate decays exponentially ( Approximate simulation of exponential decay )
if (train_steps+1) % 100 == 0:
print("train_steps:%.3f--warmup_steps:%.3f--learning_rate:%.3f" % (
train_steps+1,warmup_steps,learning_rate))2. The above code to achieve Warmup Warm up the learning rate and decay after the learning rate is warmed up (sin or exp decay) The graph of is as follows :
( Four ) summary Use Warmup The way to warm up the learning rate , That is, first use the initial primary school practice rate to train , Then each step A little bit bigger , Until the relatively large learning rate originally set is reached ( notes : At this time, the warm-up learning rate is completed ), Use the learning rate initially set for training ( notes : Warm up the training process after completing the learning rate , Learning rate is decaying ), It helps to make the convergence speed of the model faster , The effect is better. .
Publisher : Full stack programmer stack length , Reprint please indicate the source :https://javaforall.cn/132203.html Link to the original text :https://javaforall.cn
边栏推荐
- 零样本和少样本学习
- 2022-06-30:以下golang代码输出什么?A:0;B:2;C:运行错误。 package main import “fmt“ func main() { ints := make
- Esp8266 becomes client and server
- Redis - 01 缓存:如何利用读缓存提高系统性能?
- Solution to the conflict between unique index and logical deletion
- AtCoder Beginner Contest 255
- What does the software test report contain? How to obtain high quality software test reports?
- Swift 5.0 - creation and use of swift framework
- Architecture of IM integrated messaging system sharing 100000 TPS
- Swift5.0 ----Swift FrameWork的创建及使用
猜你喜欢

Ride: get picture Base64

理想中的接口自动化项目

5G智慧建筑解决方案2021

公有云市场迈入深水区,冷静的亚马逊云还坐得住吗?

10 airbags are equipped as standard, and Chery arizer 8 has no dead corner for safety protection

QQmlApplicationEngine failed to load component qrc:/main. qml:-1 No such file or directory

Redis - 01 cache: how to use read cache to improve system performance?

Kubevela 1.4: make application delivery safer, easier to use, and more transparent

Doker的容器数据卷

Introduction to digital transformation solutions for enterprises going to sea
随机推荐
Swift5.0 ----Swift FrameWork的创建及使用
深入解析 Apache BookKeeper 系列:第四篇—背压
How to mention hot fix and cherry pick
软件测试报告包含哪些内容?如何获取高质量软件测试报告?
Solve arm_ release_ ver of this libmali is ‘g2p0-01eac0‘,rk_ so_ Ver is' 4 ', libgl1 mesa dev will not be installed, and there are unsatisfied dependencies
CTFSHOW权限维持篇
Fund customer service
What is flush software? In addition, is it safe to open an account online now?
Redis - 01 cache: how to use read cache to improve system performance?
Shell multitasking to download video at the same time
Ms17-010 Eternal Blue vulnerability of MSF
异步过渡方案—Generator
CesiumJS 2022^ 源码解读[6] - 三维模型(ModelExperimental)新架构
Prospects of world digitalization and machine intelligence in the next decade
如何区分平台安全和网上炒作?网络投机有哪些止损技巧?
What does project management really manage?
Asynchronous transition scenario - generator
零样本和少样本学习
Redis的事务和锁机制
What are the contents and processes of software validation testing? How much does it cost to confirm the test report?