当前位置：网站首页>TensorFlow2 study notes: 7. Optimizer

TensorFlow2 study notes: 7. Optimizer

2022-08-04 06:05:00 【Live up to [email protected]】

The following are the types of optimizers in the official TensorFlow documentation:
insert image description here
tensorflow built-in optimizer path:

tf.train.GradientDescentOptimizer

This class is an optimizer that implements the gradient descent algorithm.

tf.train.AdadeltaOptimizer

Implemented the optimizer of the Adadelta algorithm, which does not require manual tuning of the learning rate, has strong anti-noise ability, and can choose different model structures.Adadelta is an extension to Adagrad.Adadelta only accumulates items of a fixed size, and does not store these items directly, just calculates the corresponding average.

tf.train.AdagradOptimizer

An optimizer that implements the AdagradOptimizer algorithm, Adagrad will accumulate all previous squared gradients.It is used to deal with large sparse matrices. Adagrad can automatically change the learning rate. It just needs to set a global learning rate, but this is not the actual learning rate. The actual rate is inversely proportional to the square root of the sum of the previous parameters.of.This allows each parameter to have its own learning rate.

tf.train.MomentumOptimizer

The optimizer that implements the MomentumOptimizer algorithm. If the gradient maintains one direction for a long time, the parameter update range will be increased; on the contrary, if the sign flip occurs frequently, it means that this is to reduce the parameter update.Amplitude.This process can be understood as dropping a ball from the top of the mountain, which will slide faster and faster.

tf.train.RMSPropOptimizer

An optimizer that implements the RMSPropOptimizer algorithm, which is similar to Adam, but uses a different sliding average.

tf.train.AdamOptimizer

The optimizer that implements the AdamOptimizer algorithm, which combines the Momentum and RMSProp methods, and retains a learning rate and an exponentially decaying mean based on past gradient information for each parameter.

How to choose:

If the data is sparse, use adaptive methods, i.e. Adagrad, Adadelta, RMSprop, Adam.

RMSprop, Adadelta, Adam are similar in many cases.

Adam adds bias-correction and momentum to RMSprop.

As the gradient becomes sparse, Adam performs better than RMSprop.

Overall, Adam is the best choice.

SGD is used in many papers without momentum, etc.Although SGD can reach a minimum value, it takes longer than other algorithms and may be trapped in a saddle point.

If you need faster convergence, or if you need to train deeper and more complex neural networks, you need to use an adaptive algorithm.

原网站

版权声明
本文为[Live up to [email protected]]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/216/202208040525327568.html