当前位置:网站首页>TensorFlow2 study notes: 7. Optimizer
TensorFlow2 study notes: 7. Optimizer
2022-08-04 06:05:00 【Live up to [email protected]】
The following are the types of optimizers in the official TensorFlow documentation:
tensorflow built-in optimizer path: 
tf.train.GradientDescentOptimizer
This class is an optimizer that implements the gradient descent algorithm.
tf.train.AdadeltaOptimizer
Implemented the optimizer of the Adadelta algorithm, which does not require manual tuning of the learning rate, has strong anti-noise ability, and can choose different model structures.Adadelta is an extension to Adagrad.Adadelta only accumulates items of a fixed size, and does not store these items directly, just calculates the corresponding average.
tf.train.AdagradOptimizer
An optimizer that implements the AdagradOptimizer algorithm, Adagrad will accumulate all previous squared gradients.It is used to deal with large sparse matrices. Adagrad can automatically change the learning rate. It just needs to set a global learning rate, but this is not the actual learning rate. The actual rate is inversely proportional to the square root of the sum of the previous parameters.of.This allows each parameter to have its own learning rate.
tf.train.MomentumOptimizer
The optimizer that implements the MomentumOptimizer algorithm. If the gradient maintains one direction for a long time, the parameter update range will be increased; on the contrary, if the sign flip occurs frequently, it means that this is to reduce the parameter update.Amplitude.This process can be understood as dropping a ball from the top of the mountain, which will slide faster and faster.
tf.train.RMSPropOptimizer
An optimizer that implements the RMSPropOptimizer algorithm, which is similar to Adam, but uses a different sliding average.
tf.train.AdamOptimizer
The optimizer that implements the AdamOptimizer algorithm, which combines the Momentum and RMSProp methods, and retains a learning rate and an exponentially decaying mean based on past gradient information for each parameter.
How to choose:
If the data is sparse, use adaptive methods, i.e. Adagrad, Adadelta, RMSprop, Adam.
RMSprop, Adadelta, Adam are similar in many cases.
Adam adds bias-correction and momentum to RMSprop.
As the gradient becomes sparse, Adam performs better than RMSprop.
Overall, Adam is the best choice.
SGD is used in many papers without momentum, etc.Although SGD can reach a minimum value, it takes longer than other algorithms and may be trapped in a saddle point.
If you need faster convergence, or if you need to train deeper and more complex neural networks, you need to use an adaptive algorithm.
版权声明
本文为[Live up to [email protected]]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/216/202208040525327568.html
边栏推荐
猜你喜欢
随机推荐
thymeleaf中 th:href使用笔记
超详细MySQL总结
彻底搞懂箱形图分析
(十六)图的基本操作---两种遍历
Delphi-C side interesting menu operation interface design
剑指 Offer 2022/7/2
组原模拟题
攻防世界MISC———Dift
CTFshow—Web入门—信息(1-8)
【go语言入门笔记】13、 结构体(struct)
Thread 、Handler和IntentService的用法
安装dlib踩坑记录,报错:WARNING: pip is configured with locations that require TLS/SSL
剑指 Offer 2022/7/5
Kubernetes基础入门(完整版)
(十二)树--哈夫曼树
二月、三月校招面试复盘总结(一)
数据库根据提纲复习
【CV-Learning】卷积神经网络
NFT市场以及如何打造一个NFT市场
PHP课堂笔记(一)








