当前位置:网站首页>Warmup preheating learning rate "suggestions collection"
Warmup preheating learning rate "suggestions collection"
2022-06-30 23:10:00 【Full stack programmer webmaster】
Hello everyone , I meet you again , I'm your friend, Quan Jun .
Study Rate is one of the most important super parameters in neural network training , There are many ways to optimize the learning rate ,Warmup It's one of them ( One )、 What is? Warmup? Warmup Is in ResNet A method of preheating learning rate mentioned in the paper , It chooses to use a smaller learning rate at the beginning of training , Trained some epoches perhaps steps( such as 4 individual epoches,10000steps), Then modify it to the preset learning for training .
( Two )、 Why use Warmup? Because at the beginning of training , Model weight (weights) It's randomly initialized , At this time, if you choose a larger learning rate , It may lead to the instability of the model ( Oscillate ), choice Warmup The way to warm up the learning rate , A few that can make you start training epoches Or some steps The internal learning rate is small , Under the preheating primary school attendance rate , The model can gradually become stable , After the model is relatively stable, select the preset learning rate for training , It makes the convergence speed of the model faster , The effect of the model is better .
E x a m p l e Example Example:Resnet The paper uses a 110 Layer of ResNet stay cifar10 In training , First use 0.01 The learning rate is trained until the training error is less than 80%( Probably trained 400 individual steps), And then use 0.1 The learning rate of training .
( 3、 ... and )、Warmup Improvement ( Two ) in question Warmup yes constant warmup, Its disadvantage is that changing from a small learning rate to a large learning rate may lead to a sudden increase in training error . therefore 18 year Facebook Put forward gradual warmup To solve this problem , That is, from the initial primary school attendance rate , Every step A little bit bigger , Until the relatively large learning rate originally set is reached , Use the learning rate initially set for training .
1.gradual warmup The implementation simulation code is as follows :
"""
Implements gradual warmup, if train_steps < warmup_steps, the
learning rate will be `train_steps/warmup_steps * init_lr`.
Args:
warmup_steps:warmup Step threshold , namely train_steps<warmup_steps, Use warm-up learning rate , Otherwise, use the preset learning rate
train_steps: Number of steps trained
init_lr: Preset learning rate
"""
import numpy as np
warmup_steps = 2500
init_lr = 0.1
# Simulation training 15000 Step
max_steps = 15000
for train_steps in range(max_steps):
if warmup_steps and train_steps < warmup_steps:
warmup_percent_done = train_steps / warmup_steps
warmup_learning_rate = init_lr * warmup_percent_done #gradual warmup_lr
learning_rate = warmup_learning_rate
else:
#learning_rate = np.sin(learning_rate) # After warming up the learning rate , The learning rate is sin attenuation
learning_rate = learning_rate**1.0001 # After warming up the learning rate , The learning rate decays exponentially ( Approximate simulation of exponential decay )
if (train_steps+1) % 100 == 0:
print("train_steps:%.3f--warmup_steps:%.3f--learning_rate:%.3f" % (
train_steps+1,warmup_steps,learning_rate))2. The above code to achieve Warmup Warm up the learning rate and decay after the learning rate is warmed up (sin or exp decay) The graph of is as follows :
( Four ) summary Use Warmup The way to warm up the learning rate , That is, first use the initial primary school practice rate to train , Then each step A little bit bigger , Until the relatively large learning rate originally set is reached ( notes : At this time, the warm-up learning rate is completed ), Use the learning rate initially set for training ( notes : Warm up the training process after completing the learning rate , Learning rate is decaying ), It helps to make the convergence speed of the model faster , The effect is better. .
Publisher : Full stack programmer stack length , Reprint please indicate the source :https://javaforall.cn/132203.html Link to the original text :https://javaforall.cn
边栏推荐
- 微信小程序通过点击事件传参(data-)
- 如何使用 DataAnt 监控 Apache APISIX
- C language array interception, C string by array interception method (c/s)
- The superficial understanding of the industrial Internet finally brought the development of the industrial Internet into the strange circle of the consumer Internet
- pytorch 的Conv2d的详细解释
- 【Android,Kotlin,TFLite】移动设备集成深度学习轻模型TFlite(物体检测篇)
- In depth analysis of Apache bookkeeper series: Part 4 - back pressure
- Ideal interface automation project
- MaxPool2d详解--在数组和图像中的应用
- Jmeter跨线程参数关联无需脚本
猜你喜欢

基于kubernetes平台微服务的部署

Braces on the left of latex braces in latex multiline formula

在线客服系统代码_h5客服_对接公众号_支持APP_支持多语言

What does project management really manage?

New trends of China's national tide development in 2022

Redis的缓存穿透、缓存击穿和缓存雪崩

【Android,Kotlin,TFLite】移动设备集成深度学习轻模型TFlite(物体检测篇)

MaxPool2d详解--在数组和图像中的应用

Ride: get picture Base64

Zero sample and small sample learning
随机推荐
一次革命、两股力量、三大环节:《工业能效提升行动计划》背后的“减碳”路线图
Qt笔记(七十四)之QLineEdit指定输入类型
远程办公期间,项目小组微信群打卡 | 社区征文
Ms17-010 Eternal Blue vulnerability of MSF
Deployment of microservices based on kubernetes platform
Discuz forum speed up to delete XXX under data/log PHP file
conv2d详解--在数组和图像中的使用
Solution to the conflict between unique index and logical deletion
5G智慧建筑解决方案2021
"More Ford, more China" saw through the clouds, and the orders of Changan Ford's flagship products exceeded 10000
d编译时计数
如何使用 DataAnt 监控 Apache APISIX
Introduction to digital transformation solutions for enterprises going to sea
[golang] golang实现截取字符串函数SubStr
CTFSHOW框架复现篇
Ideal interface automation project
2022-06-30: what does the following golang code output? A:0; B:2; C: Running error. package main import “fmt“ func main() { ints := make
软件测试报告包含哪些内容?如何获取高质量软件测试报告?
[fundamentals of wireless communication-13]: illustrated mobile communication technology and application development-1-overview
深入解析 Apache BookKeeper 系列:第四篇—背压