当前位置:网站首页>Summary of gradient descent optimizer (rmsprop, momentum, Adam)
Summary of gradient descent optimizer (rmsprop, momentum, Adam)
2022-06-30 15:35:00 【Zi Yan Ruoshui】
Recommended links :
Gradient descent optimizer visualization RMSprop - The search results - You know
pytorch relevant api torch.optim.Adam The meaning of parameters in the algorithm _ Ziyan Ruoshui's blog -CSDN Blog _adam Medium weight_decay
Text content :
Original gradient descent algorithm
delta = - learning_rate * gradient
theta += delta
The problem is that the same method is used in particularly steep places and particularly gentle places learning_rate, If learning_rate It's easy to be too big. In steep places, a large iteration will cross the optimal solution far , If learning_rate Too small , In a flat place, you will learn very slowly .

So there is RMSProp Version of the optimizer ,
sum_of_gradient_squared = previous_sum_of_gradient_squared* decay_rate+ gradient²* (1- decay_rate)
delta = -learning_rate * gradient / sqrt(sum_of_gradient_squared)
theta += delta

RMSProp The advantage of is the learning rate
To divide by one
, and
The main component of is the size of the module of the current gradient , That is, the steepness of the current slope , This makes the current slope steeper , The smaller the steps taken , If the current slope is more gentle , The bigger the step .
Of course
There is another minor part , In the last iteration
The size of the module , This value records the historical steepness of the slope experienced recently , It's a context value , It can be speculated to a certain extent in several future iterations .
Only the gradient of history is considered above , If the historical component changes from the historical gradient to the direction and size of the previous movement, it is called the momentum method (momentum).

If you look carefully, v Change process of , In fact, only consider the previous moving direction ( And size ) Is to consider all the moving directions in history ( And size ).
If we consider both the gradient of history , At the same time, the moving direction of history is also considered , Then there are Adam Optimization method :

边栏推荐
- 1136: password translation
- 比亚迪越来越像华为?
- 1027 colors in Mars (20 points)
- 4.1 print function
- The principle of fluent 2 rendering and how to realize video rendering
- K - or unblocked project (minimum spanning tree)
- Curl: (23) failed writing body (1354 i= 1371) problem solving method
- How to browse mobile web pages on your computer
- How does sd-rtn ensure the high availability of RTE services after infrastructure failure
- I - constructing roads
猜你喜欢

Webrtc: industrial application based on Internet of things

The short video and live broadcast incubation training camp with goods opens nationwide enrollment!

Pycharm----xx. So cannot open shared object file problem solving

Summary of system stability construction practice

Notes on zero basic C language learning -- first introduction -- 1 notes that mom can understand

Is Domain Driven Design (DDD) reliable?

Developer practice - the future of Agora home AI audio and video

Oculus quest2 | unity configures the oculus quest2 development environment and packages an application for real machine testing

Infrastructure is code. What are you talking about?

Review 2021, embrace change and live up to Shaohua
随机推荐
Infrastructure is code. What are you talking about?
1130: find the first character that appears only once
[matlab] 3D drawing summary
1133: output family and friends string
Quick sort (C language)
1027 colors in Mars (20 points)
O - ACM contest and blackout (minimum spanning tree, Kruskal)
4.6 floating point number
Oculus quest2 | unity configures the oculus quest2 development environment and packages an application for real machine testing
Webrtc: industrial application based on Internet of things
Mysql database - create user name & modify permission
Matlab judge palindrome number (only numbers)
The sound network has fully opened the real-time transmission network sd-rtn, which has been free of network wide accidents for seven years - this is FPA!
Using member variables and member functions of a class
How to browse mobile web pages on your computer
The short video and live broadcast incubation training camp with goods opens nationwide enrollment!
4.7 type() function query data type
It's so brain - burning that no wonder programmers lose their hair
map reduce案例超详细讲解
What would you choose between architecture optimization and business iteration?