当前位置：网站首页>How to find the optimal learning rate

How to find the optimal learning rate

2022-07-01 18:35:00 【Zi Yan Ruoshui】

Link to the original text ： How to find the optimal learning rate - You know

After a lot of alchemy, students know , Hyperparameters are a very mysterious thing , such as batch size, Learning rate, etc , There are no rules and reasons for setting these things , The super parameters set in the paper are generally determined by experience . But hyperparameters are often particularly important , For example, learning rate , If you set a too large learning rate , that loss It's going to explode , The learning rate set is too small , The waiting time is very long , So do we have a scientific way to determine our initial learning rate ？

In this article , I will talk about a very simple but effective way to determine a reasonable initial learning rate .

The selection strategy of learning rate is constantly changing in the process of network training , At the beginning , Random parameter comparison , So we should choose a relatively large learning rate , such loss Falling faster ; After training for a period of time , The update of parameters should be smaller , Therefore, the learning rate is generally attenuated , There are many ways of attenuation , For example, multiply the learning rate by a certain number of steps 0.1, There are also exponential decay and so on .

One of our concerns here is how to determine the initial learning rate , Of course, there are many ways , A stupid way is to start from 0.0001 Start trying , And then use 0.001, The learning rate of each order of magnitude runs the network , Then watch loss The situation of , A relatively reasonable learning rate , But this method takes too much time , Can there be a simpler and more effective way ？

A simple way

Leslie N. Smith stay 2015 A paper in “Cyclical Learning Rates for Training Neural Networks” Medium 3.3 This section describes a great way to find the initial learning rate , At the same time, I recommend you to read this paper , There are some very enlightening learning rate setting ideas .

In this paper, this method is used to estimate the minimum learning rate and maximum learning rate allowed by the network , We can also use it to find our optimal initial learning rate , The method is very simple . First, we set a very small initial learning rate , such as 1e-5, And then in each batch Then update the network , At the same time, increase the learning rate , Count each one batch Calculated loss. Finally, we can describe the change curve of learning and loss The curve of change , From this, we can find the best learning rate .

As the number of iterations increases , The curve of increasing learning rate , And different learning rates loss The curve of .