当前位置:网站首页>Summary of gradient descent optimizer (rmsprop, momentum, Adam)
Summary of gradient descent optimizer (rmsprop, momentum, Adam)
2022-06-30 15:35:00 【Zi Yan Ruoshui】
Recommended links :
Gradient descent optimizer visualization RMSprop - The search results - You know
pytorch relevant api torch.optim.Adam The meaning of parameters in the algorithm _ Ziyan Ruoshui's blog -CSDN Blog _adam Medium weight_decay
Text content :
Original gradient descent algorithm
delta = - learning_rate * gradient
theta += delta
The problem is that the same method is used in particularly steep places and particularly gentle places learning_rate, If learning_rate It's easy to be too big. In steep places, a large iteration will cross the optimal solution far , If learning_rate Too small , In a flat place, you will learn very slowly .

So there is RMSProp Version of the optimizer ,
sum_of_gradient_squared = previous_sum_of_gradient_squared* decay_rate+ gradient²* (1- decay_rate)
delta = -learning_rate * gradient / sqrt(sum_of_gradient_squared)
theta += delta

RMSProp The advantage of is the learning rate
To divide by one
, and
The main component of is the size of the module of the current gradient , That is, the steepness of the current slope , This makes the current slope steeper , The smaller the steps taken , If the current slope is more gentle , The bigger the step .
Of course
There is another minor part , In the last iteration
The size of the module , This value records the historical steepness of the slope experienced recently , It's a context value , It can be speculated to a certain extent in several future iterations .
Only the gradient of history is considered above , If the historical component changes from the historical gradient to the direction and size of the previous movement, it is called the momentum method (momentum).

If you look carefully, v Change process of , In fact, only consider the previous moving direction ( And size ) Is to consider all the moving directions in history ( And size ).
If we consider both the gradient of history , At the same time, the moving direction of history is also considered , Then there are Adam Optimization method :

边栏推荐
- Oculus quest2 | unity configures the oculus quest2 development environment and packages an application for real machine testing
- Super comprehensive redis distributed high availability solution: sentry mechanism
- 1151 LCA in a binary tree (30 points)
- What would you choose between architecture optimization and business iteration?
- 比亚迪越来越像华为?
- Start your global dynamic acceleration journey of Web services in three steps
- Pycharm----xx. So cannot open shared object file problem solving
- How to browse mobile web pages on your computer
- 1133: output family and friends string
- K - rochambau (joint search, enumeration)
猜你喜欢

Kubernetes: a comprehensive analysis of container choreography

Complement (Niuke)

Shift operator (detailed)
![[matlab] 2D drawing summary](/img/de/6bb5385f440a2997dbf9cbb9a756eb.jpg)
[matlab] 2D drawing summary

How should we understand the variability of architecture design?

Scattered knowledge of C language (unfinished)

Start your global dynamic acceleration journey of Web services in three steps

FoxPro and I
![[matlab] 3D drawing summary](/img/57/05156340ccdd79b866c4df955b3713.jpg)
[matlab] 3D drawing summary

Pycharm----xx. So cannot open shared object file problem solving
随机推荐
1133: output family and friends string
1130: find the first character that appears only once
Technology sharing | anyrtc service single port design
Chapter 2 installation and use of vscode editor
4.1 print function
Curl: (23) failed writing body (1354 i= 1371) problem solving method
[ten thousand words long article] thoroughly understand load balancing
C language \t usage
Scattered knowledge of C language (unfinished)
Matlab judge palindrome number (only numbers)
How to get palindrome number in MATLAB (using fliplr function)
Flask Sqlalchemy - automatically export the table model (flask sqlacodegen) & no single primary key problem ---orm (8)
国债逆回购在哪个平台上买比较安全?
Web technology sharing | whiteboard toolbar encapsulation of Web
1135: paired base chain
RTC monthly tabloid programming challenge ended successfully in June; Review of the first anniversary of sound network's listing
Developer practice - the future of Agora home AI audio and video
Start your global dynamic acceleration journey of Web services in three steps
Introduction to using 51 single chip microcomputer to control steering gear
001 data type [basic]