当前位置:网站首页>Summary of gradient descent optimizer (rmsprop, momentum, Adam)
Summary of gradient descent optimizer (rmsprop, momentum, Adam)
2022-06-30 15:35:00 【Zi Yan Ruoshui】
Recommended links :
Gradient descent optimizer visualization RMSprop - The search results - You know
pytorch relevant api torch.optim.Adam The meaning of parameters in the algorithm _ Ziyan Ruoshui's blog -CSDN Blog _adam Medium weight_decay
Text content :
Original gradient descent algorithm
delta = - learning_rate * gradient
theta += delta
The problem is that the same method is used in particularly steep places and particularly gentle places learning_rate, If learning_rate It's easy to be too big. In steep places, a large iteration will cross the optimal solution far , If learning_rate Too small , In a flat place, you will learn very slowly .

So there is RMSProp Version of the optimizer ,
sum_of_gradient_squared = previous_sum_of_gradient_squared* decay_rate+ gradient²* (1- decay_rate)
delta = -learning_rate * gradient / sqrt(sum_of_gradient_squared)
theta += delta

RMSProp The advantage of is the learning rate
To divide by one
, and
The main component of is the size of the module of the current gradient , That is, the steepness of the current slope , This makes the current slope steeper , The smaller the steps taken , If the current slope is more gentle , The bigger the step .
Of course
There is another minor part , In the last iteration
The size of the module , This value records the historical steepness of the slope experienced recently , It's a context value , It can be speculated to a certain extent in several future iterations .
Only the gradient of history is considered above , If the historical component changes from the historical gradient to the direction and size of the previous movement, it is called the momentum method (momentum).

If you look carefully, v Change process of , In fact, only consider the previous moving direction ( And size ) Is to consider all the moving directions in history ( And size ).
If we consider both the gradient of history , At the same time, the moving direction of history is also considered , Then there are Adam Optimization method :

边栏推荐
- Flask-SQLAlchemy----sqlalchemy. exc.InvalidRequestError: SQL expression, column, or mapped e---ORM(9)
- 数数据可视化实战案例(timeline轮播图,streamlit 控件年份 metabase可视化使用教程)2.0
- Examples of bubble sorting and matrix element screening in MATLAB
- Distributed -- openresty+lua+redis
- Complement (Niuke)
- Introduction to using 51 single chip microcomputer to control steering gear
- The crystal ball "data insight" was officially launched: insight into the change of consumption trend and the details of interactive experience
- 容器常用命令
- Bucket sorting (C language)
- Matlab judge palindrome number (only numbers)
猜你喜欢

Webrtc: industrial application based on Internet of things

At the beginning of the 2022 new year, I will send you hundreds of dry articles

FoxPro and I

Teach you a learning method to quickly master knowledge

How to do a good job in high concurrency system design? I have summarized three points

深入理解.Net中的线程同步之构造模式(二)内核模式1.内核模式构造物Event事件

Rte2021 review HDR technology product practice and exploration

Pycharm----xx. So cannot open shared object file problem solving

Complement (Niuke)

Kubernetes: a comprehensive analysis of container choreography
随机推荐
Zero basic C language learning notes -- first introduction -- 2 data types & variables and constants
Quick sort (C language)
1035 password (20 points)
The principle of fluent 2 rendering and how to realize video rendering
Webrtc: industrial application based on Internet of things
Is Domain Driven Design (DDD) reliable?
001 basic knowledge (unfinished)
1150 traveling salesman problem (25 points)
1130: find the first character that appears only once
C language foundation - pointer array - initialization method & constant pointer array, pointer constant array
Pycharm----xx. So cannot open shared object file problem solving
L - Jungle roads (minimum spanning tree)
Average and maximum values of MATLAB matrix
Jupyter notebook basic knowledge learning
4.3 variables and assignments
Single cycle CPU of the design group of West University of Technology
1066 root of AVL tree (25 points)
Advanced functions of ES6 operation array map (), filter (), reduce()
1132: stone scissors cloth
1149 dangerous goods packaging (25 points)