当前位置:网站首页>TensorFlow2 study notes: 7. Optimizer
TensorFlow2 study notes: 7. Optimizer
2022-08-04 06:05:00 【Live up to [email protected]】
The following are the types of optimizers in the official TensorFlow documentation:
tensorflow built-in optimizer path:
tf.train.GradientDescentOptimizer
This class is an optimizer that implements the gradient descent
algorithm.
tf.train.AdadeltaOptimizer
Implemented the optimizer of the Adadelta
algorithm, which does not require manual tuning of the learning rate, has strong anti-noise ability, and can choose different model structures.Adadelta
is an extension to Adagrad
.Adadelta only accumulates items of a fixed size, and does not store these items directly, just calculates the corresponding average.
tf.train.AdagradOptimizer
An optimizer that implements the AdagradOptimizer
algorithm, Adagrad will accumulate all previous squared gradients.It is used to deal with large sparse matrices. Adagrad can automatically change the learning rate. It just needs to set a global learning rate, but this is not the actual learning rate. The actual rate is inversely proportional to the square root of the sum of the previous parameters.of.This allows each parameter to have its own learning rate.
tf.train.MomentumOptimizer
The optimizer that implements the MomentumOptimizer
algorithm. If the gradient maintains one direction for a long time, the parameter update range will be increased; on the contrary, if the sign flip occurs frequently, it means that this is to reduce the parameter update.Amplitude.This process can be understood as dropping a ball from the top of the mountain, which will slide faster and faster.
tf.train.RMSPropOptimizer
An optimizer that implements the RMSPropOptimizer
algorithm, which is similar to Adam, but uses a different sliding average.
tf.train.AdamOptimizer
The optimizer that implements the AdamOptimizer
algorithm, which combines the Momentum and RMSProp methods, and retains a learning rate and an exponentially decaying mean based on past gradient information for each parameter.
How to choose:
If the data is sparse, use adaptive methods, i.e. Adagrad
, Adadelta
, RMSprop
, Adam
.
RMSprop, Adadelta, Adam are similar in many cases.
Adam adds bias-correction and momentum to RMSprop.
As the gradient becomes sparse, Adam performs better than RMSprop.
Overall, Adam is the best choice.
SGD is used in many papers without momentum, etc.Although SGD can reach a minimum value, it takes longer than other algorithms and may be trapped in a saddle point.
If you need faster convergence, or if you need to train deeper and more complex neural networks, you need to use an adaptive algorithm.
版权声明
本文为[Live up to [email protected]]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/216/202208040525327568.html
边栏推荐
猜你喜欢
随机推荐
自动化运维工具Ansible(5)流程控制
Zend FrameWork RCE1
SQL练习 2022/7/1
ISCC2021———MISC部分复现(练武)
安装dlib踩坑记录,报错:WARNING: pip is configured with locations that require TLS/SSL
EPSON RC+ 7.0 使用记录一
iptables防火墙
【深度学习21天学习挑战赛】2、复杂样本分类识别——卷积神经网络(CNN)服装图像分类
对象存储-分布式文件系统-MinIO-1:概念
k3s-轻量级Kubernetes
剑指 Offer 2022/7/12
Lombok的一些使用心得
flink sql left join数据倾斜问题解决
flink-sql查询配置与性能优化参数详解
ISCC2021——web部分
【go语言入门笔记】13、 结构体(struct)
MySql的concat和group_concat的区别
WARNING: sql version 9.2, server version 11.0.Some psql features might not work.
自动化运维工具Ansible(6)Jinja2模板
Kubernetes基本入门-名称空间资源(三)