当前位置:网站首页>Popular understanding of gradient descent
Popular understanding of gradient descent
2022-07-03 15:16:00 【alw_ one hundred and twenty-three】
I have planned to present this series of blog posts in the form of animated interesting popular science , If you're interested Click here .
0. What is the use of gradient descent ?
In fact, gradient descent is not a machine learning algorithm , It's a search based optimization method . Because many algorithms do not have closed form solutions , Therefore, we need to find a set of parameters through iteration after iteration to minimize our loss function . The approximate routine of loss function can be seen in this figure :
So , If we use human words to describe what gradient descent is , Namely ... I keep bathing ( seek ), greasy ( By spectrum ) My elder martial sister ( The weight ) Where is the ( How much is the )..
1. How to search ?
Just now we know that gradient descent is used to find weights , Then how to find the weight pinch ? blind XX Metaphysical guess ? impossible .. It's impossible for metaphysics to guess in this life . Just think about it , The value range of weight can be regarded as a real number space , that 100 These characteristics correspond to 100 A weight ,10000 These characteristics correspond to 10000 A weight . If you rely on blindness XX If metaphysics guesses the weight . Um. , I should never guess it in my life . So find a routine to find the weight . This routine is gradient !!!
So what is the gradient ? If the formula comes out, it is maozi :
Um. .. Is it familiar to you , If translated into Chinese, it is maozi :
At this time, the gradient can be clearly seen , It's nothing more than calculating the partial derivative of the weight to the loss function and arranging it into a vector . And the gradient has another property , That is, the gradient direction is the direction in which the function value increases fastest . How to understand this property ? Take a chestnut . If I am a person who wants to be LOL The dead fat house of the suburban King , Then there may be several factors to become the king of the suburbs , One is the depth of the hero pool , One is the overall view , There is also a Sao operation . They all have a certain weight for me to become a king . As shown in the figure , The arrow of each factor has a direction ( That is, the partial direction of factors for me to become king ) And length ( The size of the value of the partial derivative ). Then under the joint action of these factors , I will eventually train in one direction ( For example, the relationship between component force and resultant force in Physics ), At this time, I can go further to the king of the suburbs as soon as possible .
That is to say, if I keep working towards the final direction , Theoretically, I can become the king of the suburbs as soon as possible .
OK. Now we know that the direction of gradient is the fastest growing direction of function , Then I'll take a minus sign in front of the gradient ( In the opposite direction ), That's the fastest direction of function decline .( You can still make up your brain by yourself according to the picture of suburban King = =) So WOW , The essence of gradient descent is nothing more than updating the weight in the opposite direction of the gradient . Like the following figure , If I were blind , And then somehow came to a valley . Now all I have to do is go to the bottom of the valley . Because I'm blind , So I can only move little by little . If you want to move , Then I must sweep my feet around me , Where I feel more like going down the mountain, I'll go there . Then I can finally reach the bottom of the valley by repeating this cycle .
So le , Follow this routine , We can roll out the pseudo code of gradient descent :
Cycling is equivalent to walking down the mountain , In code α Pretending to force is called learning rate , In fact, it represents how big my step is when I go down the mountain . The smaller the value is, the more I want to , Take small steps , I'm afraid of falling into the pit . The higher the value, the more coquettish I am , But it's easy to pull eggs ~~
OK. This is my popular understanding of gradient descent , I hope it can help you .
边栏推荐
- The method of parameter estimation of user-defined function in MATLAB
- B2020 points candy
- [cloud native training camp] module VIII kubernetes life cycle management and service discovery
- Center and drag linked global and Chinese markets 2022-2028: Research Report on technology, participants, trends, market size and share
- [transform] [NLP] first proposed transformer. The 2017 paper "attention is all you need" by Google brain team
- 基于SVN分支开发模式流程浅析
- Kubernetes advanced training camp pod Foundation
- The markdown file obtains the pictures of the network and stores them locally and modifies the URL
- 阿特拉斯atlas扭矩枪 USB通讯教程基于MTCOM
- 【pytorch学习笔记】Transforms
猜你喜欢

Didi off the shelf! Data security is national security

Halcon与Winform学习第一节
![[cloud native training camp] module 7 kubernetes control plane component: scheduler and controller](/img/a4/2156b61fbf50db65fdf59c8f5538f8.png)
[cloud native training camp] module 7 kubernetes control plane component: scheduler and controller

Série yolov5 (i) - - netron, un outil de visualisation de réseau

Can‘t connect to MySQL server on ‘localhost‘

B2020 points candy
![[probably the most complete in Chinese] pushgateway entry notes](/img/5a/6dcb75f5d713ff513ad6842ff53cc3.png)
[probably the most complete in Chinese] pushgateway entry notes

Jvm-06-execution engine

Functional modules and application scenarios covered by the productization of user portraits

视觉上位系统设计开发(halcon-winform)-4.通信管理
随机推荐
【可能是全中文网最全】pushgateway入门笔记
Concurrency-02-visibility, atomicity, orderliness, volatile, CAS, atomic class, unsafe
XWiki Installation Tips
【Transform】【NLP】首次提出Transformer,Google Brain团队2017年论文《Attention is all you need》
The method of parameter estimation of user-defined function in MATLAB
【日常训练】395. 至少有 K 个重复字符的最长子串
Yolov5系列(一)——网络可视化工具netron
视觉上位系统设计开发(halcon-winform)-2.全局变量设计
【注意力机制】【首篇ViT】DETR,End-to-End Object Detection with Transformers网络的主要组成是CNN和Transformer
Global and Chinese market of air cargo logistics 2022-2028: Research Report on technology, participants, trends, market size and share
Jvm-06-execution engine
Centos7 deployment sentry redis (with architecture diagram, clear and easy to understand)
Can‘t connect to MySQL server on ‘localhost‘
Global and Chinese markets for infrared solutions (for industrial, civil, national defense and security applications) 2022-2028: Research Report on technology, participants, trends, market size and sh
"Seven weapons" in the "treasure chest" of machine learning: Zhou Zhihua leads the publication of the new book "machine learning theory guide"
官网MapReduce实例代码详细批注
Kubernetes - YAML文件解读
[Yu Yue education] scientific computing and MATLAB language reference materials of Central South University
Jvm-05-object, direct memory, string constant pool
视觉上位系统设计开发(halcon-winform)-6.节点与宫格