当前位置:网站首页>How to find the optimal learning rate
How to find the optimal learning rate
2022-07-01 18:35:00 【Zi Yan Ruoshui】
Link to the original text : How to find the optimal learning rate - You know
After a lot of alchemy, students know , Hyperparameters are a very mysterious thing , such as batch size, Learning rate, etc , There are no rules and reasons for setting these things , The super parameters set in the paper are generally determined by experience . But hyperparameters are often particularly important , For example, learning rate , If you set a too large learning rate , that loss It's going to explode , The learning rate set is too small , The waiting time is very long , So do we have a scientific way to determine our initial learning rate ?
In this article , I will talk about a very simple but effective way to determine a reasonable initial learning rate .


The selection strategy of learning rate is constantly changing in the process of network training , At the beginning , Random parameter comparison , So we should choose a relatively large learning rate , such loss Falling faster ; After training for a period of time , The update of parameters should be smaller , Therefore, the learning rate is generally attenuated , There are many ways of attenuation , For example, multiply the learning rate by a certain number of steps 0.1, There are also exponential decay and so on .
One of our concerns here is how to determine the initial learning rate , Of course, there are many ways , A stupid way is to start from 0.0001 Start trying , And then use 0.001, The learning rate of each order of magnitude runs the network , Then watch loss The situation of , A relatively reasonable learning rate , But this method takes too much time , Can there be a simpler and more effective way ?
A simple way
Leslie N. Smith stay 2015 A paper in “Cyclical Learning Rates for Training Neural Networks” Medium 3.3 This section describes a great way to find the initial learning rate , At the same time, I recommend you to read this paper , There are some very enlightening learning rate setting ideas .
In this paper, this method is used to estimate the minimum learning rate and maximum learning rate allowed by the network , We can also use it to find our optimal initial learning rate , The method is very simple . First, we set a very small initial learning rate , such as 1e-5, And then in each batch Then update the network , At the same time, increase the learning rate , Count each one batch Calculated loss. Finally, we can describe the change curve of learning and loss The curve of change , From this, we can find the best learning rate .
As the number of iterations increases , The curve of increasing learning rate , And different learning rates loss The curve of .


边栏推荐
- Detailed explanation of select in golang
- Bug of QQ browser article comment: the commentator is wrong
- [acnoi2022] color ball
- What are the six steps of the software development process? How to draw software development flow chart?
- Financial judgment questions
- 目前炒期货在哪里开户最正规安全?怎么期货开户?
- Data query language (DQL)
- js如何将带有分割符的字符串转化成一个n维数组
- Sum of three numbers
- Function, condition, regular expression
猜你喜欢

Xia CaoJun ffmpeg 4.3 audio and video foundation to engineering application

Basic concepts of binary tree

Leetcode problem solving series -- continuous positive sequence with sum as s (sliding window)

Work and leisure suggestions of old programmers

Domestic spot silver should be understood

PTA year of birth

Yuancosmos game farmersworld farmers world - core content of the second conference in China!

主成分之综合竞争力案例分析

Mujoco's biped robot Darwin model

Check log4j problems using stain analysis
随机推荐
Penetration practice vulnhub range Keyring
Relationship between sensor size, pixel, dpi resolution, inch and millimeter
Apache iceberg source code analysis: schema evolution
From comedians to NBA Zhan Huang, check the encrypted advertisements during this super bowl
Samba basic usage
The latest intelligent factory MES management system software solution
[noip2015] jumping stone
February 16, 2022 Daily: graph neural network self training method under distribution and migration
Highly reliable program storage and startup control system based on anti fuse FPGA and QSPI flash
The 13th simulation problem of the single chip microcomputer provincial competition of the Blue Bridge Cup
Opencv map reading test -- error resolution
Mujoco model learning record
Depth first search - DFS (burst search)
MySQL + JSON = King fried
Cloud computing - make learning easier
t10_ Adapting to Market Participantsand Conditions
Slider verification code identification gadget display
Quick foundation of group theory (5): generators, Kelley graphs, orbits, cyclic graphs, and "dimensions" of groups?
[CF1476F]Lanterns
[image denoising] matlab code for removing salt and pepper noise based on fast and effective multistage selective convolution filter