当前位置:网站首页>How to find the optimal learning rate
How to find the optimal learning rate
2022-07-01 18:35:00 【Zi Yan Ruoshui】
Link to the original text : How to find the optimal learning rate - You know
After a lot of alchemy, students know , Hyperparameters are a very mysterious thing , such as batch size, Learning rate, etc , There are no rules and reasons for setting these things , The super parameters set in the paper are generally determined by experience . But hyperparameters are often particularly important , For example, learning rate , If you set a too large learning rate , that loss It's going to explode , The learning rate set is too small , The waiting time is very long , So do we have a scientific way to determine our initial learning rate ?
In this article , I will talk about a very simple but effective way to determine a reasonable initial learning rate .


The selection strategy of learning rate is constantly changing in the process of network training , At the beginning , Random parameter comparison , So we should choose a relatively large learning rate , such loss Falling faster ; After training for a period of time , The update of parameters should be smaller , Therefore, the learning rate is generally attenuated , There are many ways of attenuation , For example, multiply the learning rate by a certain number of steps 0.1, There are also exponential decay and so on .
One of our concerns here is how to determine the initial learning rate , Of course, there are many ways , A stupid way is to start from 0.0001 Start trying , And then use 0.001, The learning rate of each order of magnitude runs the network , Then watch loss The situation of , A relatively reasonable learning rate , But this method takes too much time , Can there be a simpler and more effective way ?
A simple way
Leslie N. Smith stay 2015 A paper in “Cyclical Learning Rates for Training Neural Networks” Medium 3.3 This section describes a great way to find the initial learning rate , At the same time, I recommend you to read this paper , There are some very enlightening learning rate setting ideas .
In this paper, this method is used to estimate the minimum learning rate and maximum learning rate allowed by the network , We can also use it to find our optimal initial learning rate , The method is very simple . First, we set a very small initial learning rate , such as 1e-5, And then in each batch Then update the network , At the same time, increase the learning rate , Count each one batch Calculated loss. Finally, we can describe the change curve of learning and loss The curve of change , From this, we can find the best learning rate .
As the number of iterations increases , The curve of increasing learning rate , And different learning rates loss The curve of .


边栏推荐
- MES production equipment manufacturing execution system software
- A database editing gadget that can edit SQLite database. SQLite database replaces fields. SQL replaces all values of a field in the database
- Zabbix报警执行远程命令
- Cloud computing - make learning easier
- What is web application security testing technology?
- PIP version problems: PIP problems still occur when installing akshare and using Tsinghua source and Douban source
- Step size of ode45 and reltol abstol
- PMP daily three questions (February 15, 2022)
- 540. Single element in ordered array
- Basic usage of shell script
猜你喜欢

Force buckle day33

Mujoco model learning record
![[PHP foundation] realize the connection between PHP and SQL database](/img/eb/c8953eddfe3b19b0adb5529957d275.jpg)
[PHP foundation] realize the connection between PHP and SQL database

Database - MySQL advanced SQL statement (I)

Blue Bridge Cup real topic: the shortest circuit

From comedians to NBA Zhan Huang, check the encrypted advertisements during this super bowl

MySQL connection tools
![[today in history] February 15: Pascal's father was born; YouTube was founded; Kotlin language comes out](/img/f3/20b73f3545cdd17b9fbc52bf493ab4.jpg)
[today in history] February 15: Pascal's father was born; YouTube was founded; Kotlin language comes out

Highly reliable program storage and startup control system based on anti fuse FPGA and QSPI flash

PTA year of birth
随机推荐
Equipment simulation and deduction training system software
To improve the efficiency of office collaboration, trackup may be the best choice
Static timing analysis (STA) in ic/fpga design
Mujoco XML modeling
Roll out! Enlightenment!
Blackwich: the roadmap of decarbonization is the first step to realize the equitable energy transformation in Asia
Irradiance, Joule energy, exercise habits
Debiasing word embeddings | talking about word embedding and deviation removal # yyds dry goods inventory #
[source code analysis] NVIDIA hugectr, GPU version parameter server - (1)
The latest software scheme of the intelligent information management system of the armed police force
Rust language - cargo, crates io
When the fixed frequency artifact falls in love with multithreading | ros2 fixed frequency topic release demo
Domestic spot silver should be understood
Batch export all pictures in PPT in one second
Bernoulli distribution (a discrete distribution)
Session layer of csframework, server and client (1)
golang中的select详解
Apache iceberg source code analysis: schema evolution
Develop those things: add playback address authentication to easycvr platform
LeetCode 148. Sort linked list