当前位置:网站首页>TensorFlow2 study notes: 7. Optimizer
TensorFlow2 study notes: 7. Optimizer
2022-08-04 06:05:00 【Live up to [email protected]】
The following are the types of optimizers in the official TensorFlow documentation:
tensorflow built-in optimizer path:
tf.train.GradientDescentOptimizer
This class is an optimizer that implements the gradient descent
algorithm.
tf.train.AdadeltaOptimizer
Implemented the optimizer of the Adadelta
algorithm, which does not require manual tuning of the learning rate, has strong anti-noise ability, and can choose different model structures.Adadelta
is an extension to Adagrad
.Adadelta only accumulates items of a fixed size, and does not store these items directly, just calculates the corresponding average.
tf.train.AdagradOptimizer
An optimizer that implements the AdagradOptimizer
algorithm, Adagrad will accumulate all previous squared gradients.It is used to deal with large sparse matrices. Adagrad can automatically change the learning rate. It just needs to set a global learning rate, but this is not the actual learning rate. The actual rate is inversely proportional to the square root of the sum of the previous parameters.of.This allows each parameter to have its own learning rate.
tf.train.MomentumOptimizer
The optimizer that implements the MomentumOptimizer
algorithm. If the gradient maintains one direction for a long time, the parameter update range will be increased; on the contrary, if the sign flip occurs frequently, it means that this is to reduce the parameter update.Amplitude.This process can be understood as dropping a ball from the top of the mountain, which will slide faster and faster.
tf.train.RMSPropOptimizer
An optimizer that implements the RMSPropOptimizer
algorithm, which is similar to Adam, but uses a different sliding average.
tf.train.AdamOptimizer
The optimizer that implements the AdamOptimizer
algorithm, which combines the Momentum and RMSProp methods, and retains a learning rate and an exponentially decaying mean based on past gradient information for each parameter.
How to choose:
If the data is sparse, use adaptive methods, i.e. Adagrad
, Adadelta
, RMSprop
, Adam
.
RMSprop, Adadelta, Adam are similar in many cases.
Adam adds bias-correction and momentum to RMSprop.
As the gradient becomes sparse, Adam performs better than RMSprop.
Overall, Adam is the best choice.
SGD is used in many papers without momentum, etc.Although SGD can reach a minimum value, it takes longer than other algorithms and may be trapped in a saddle point.
If you need faster convergence, or if you need to train deeper and more complex neural networks, you need to use an adaptive algorithm.
版权声明
本文为[Live up to [email protected]]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/216/202208040525327568.html
边栏推荐
猜你喜欢
(十)树的基础部分(一)
[Deep Learning 21 Days Learning Challenge] 1. My handwriting was successfully recognized by the model - CNN implements mnist handwritten digit recognition model study notes
SQL的性能分析、优化
简单说Q-Q图;stats.probplot(QQ图)
剑指 Offer 2022/7/3
k9s-终端UI工具
Matplotlib中的fill_between;np.argsort()函数
简单明了,数据库设计三大范式
Lombok的一些使用心得
Vulnhub:Sar-1
随机推荐
TensorFlow2学习笔记:5、常用激活函数
Vulnhub:Sar-1
win云服务器搭建个人博客失败记录(wordpress,wamp)
MySQL事务详解(事务隔离级别、实现、MVCC、幻读问题)
Learning curve learning_curve function in sklearn
(十)树的基础部分(一)
pgsql函数中的return类型
flink-sql所有数据类型
SQL的性能分析、优化
【go语言入门笔记】13、 结构体(struct)
npm install dependency error npm ERR! code ENOTFOUNDnpm ERR! syscall getaddrinfonpm ERR! errno ENOTFOUND
【树 图 科 技 头 条】2022年6月28日 星期二 伊能静做客树图社区
剑指 Offer 2022/7/12
ISCC-2022
剑指 Offer 2022/7/4
Androd Day02
[Deep Learning 21 Days Learning Challenge] 1. My handwriting was successfully recognized by the model - CNN implements mnist handwritten digit recognition model study notes
剑指 Offer 2022/7/8
网络大作业心得笔记
sklearn中的pipeline机制