当前位置:网站首页>How to find the optimal learning rate
How to find the optimal learning rate
2022-07-01 18:35:00 【Zi Yan Ruoshui】
Link to the original text : How to find the optimal learning rate - You know
After a lot of alchemy, students know , Hyperparameters are a very mysterious thing , such as batch size, Learning rate, etc , There are no rules and reasons for setting these things , The super parameters set in the paper are generally determined by experience . But hyperparameters are often particularly important , For example, learning rate , If you set a too large learning rate , that loss It's going to explode , The learning rate set is too small , The waiting time is very long , So do we have a scientific way to determine our initial learning rate ?
In this article , I will talk about a very simple but effective way to determine a reasonable initial learning rate .
The selection strategy of learning rate is constantly changing in the process of network training , At the beginning , Random parameter comparison , So we should choose a relatively large learning rate , such loss Falling faster ; After training for a period of time , The update of parameters should be smaller , Therefore, the learning rate is generally attenuated , There are many ways of attenuation , For example, multiply the learning rate by a certain number of steps 0.1, There are also exponential decay and so on .
One of our concerns here is how to determine the initial learning rate , Of course, there are many ways , A stupid way is to start from 0.0001 Start trying , And then use 0.001, The learning rate of each order of magnitude runs the network , Then watch loss The situation of , A relatively reasonable learning rate , But this method takes too much time , Can there be a simpler and more effective way ?
A simple way
Leslie N. Smith stay 2015 A paper in “Cyclical Learning Rates for Training Neural Networks” Medium 3.3 This section describes a great way to find the initial learning rate , At the same time, I recommend you to read this paper , There are some very enlightening learning rate setting ideas .
In this paper, this method is used to estimate the minimum learning rate and maximum learning rate allowed by the network , We can also use it to find our optimal initial learning rate , The method is very simple . First, we set a very small initial learning rate , such as 1e-5, And then in each batch Then update the network , At the same time, increase the learning rate , Count each one batch Calculated loss. Finally, we can describe the change curve of learning and loss The curve of change , From this, we can find the best learning rate .
As the number of iterations increases , The curve of increasing learning rate , And different learning rates loss The curve of .
边栏推荐
- Static timing analysis (STA) in ic/fpga design
- MES production equipment manufacturing execution system software
- Review Net 20th anniversary development and 51aspx growth
- Step size of ode45 and reltol abstol
- 必看,时间序列分析
- 传感器尺寸、像素、DPI分辨率、英寸、毫米的关系
- PTA year of birth
- Is the fund of futures account safe? How to open an account?
- Explain in detail the process of realizing Chinese text classification by CNN
- The latest intelligent factory MES management system software solution
猜你喜欢
MySQL connection tools
Extract the compressed package file and retrieve the password
Basic concepts of binary tree
必看,时间序列分析
[source code analysis] model parallel distributed training Megatron (5) -- pipestream flush
Bernoulli distribution (a discrete distribution)
主成分之综合竞争力案例分析
Cloud picture says | distributed transaction management DTM: the little helper behind "buy buy buy"
Definition of rotation axis in mujoco
Quick foundation of group theory (5): generators, Kelley graphs, orbits, cyclic graphs, and "dimensions" of groups?
随机推荐
12种数据量纲化处理方式
SCP -i private key usage
A wonderful time to buy and sell stocks
Android development interview was badly hit in 3 years, and now the recruitment technical requirements are so high?
What impact will multinational encryption regulation bring to the market in 2022
APK签名流程介绍[通俗易懂]
golang中的select详解
February 16, 2022 Daily: graph neural network self training method under distribution and migration
Is Alipay wallet convenient to use?
LeetCode 148. Sort linked list
Bug of QQ browser article comment: the commentator is wrong
Search 2D matrix 2
Leetcode problem solving series -- continuous positive sequence with sum as s (sliding window)
Domestic spot silver should be understood
MySQL + JSON = King fried
Software construction scheme of smart factory collaborative management and control application system
Computer network interview assault
The latest software scheme of the intelligent information management system of the armed police force
[CF1476F]Lanterns
To improve the efficiency of office collaboration, trackup may be the best choice