当前位置:网站首页>Popular understanding of gradient descent
Popular understanding of gradient descent
2022-07-03 15:16:00 【alw_ one hundred and twenty-three】
I have planned to present this series of blog posts in the form of animated interesting popular science , If you're interested Click here .
0. What is the use of gradient descent ?
In fact, gradient descent is not a machine learning algorithm , It's a search based optimization method . Because many algorithms do not have closed form solutions , Therefore, we need to find a set of parameters through iteration after iteration to minimize our loss function . The approximate routine of loss function can be seen in this figure :
So , If we use human words to describe what gradient descent is , Namely ... I keep bathing ( seek ), greasy ( By spectrum ) My elder martial sister ( The weight ) Where is the ( How much is the )..
1. How to search ?
Just now we know that gradient descent is used to find weights , Then how to find the weight pinch ? blind XX Metaphysical guess ? impossible .. It's impossible for metaphysics to guess in this life . Just think about it , The value range of weight can be regarded as a real number space , that 100 These characteristics correspond to 100 A weight ,10000 These characteristics correspond to 10000 A weight . If you rely on blindness XX If metaphysics guesses the weight . Um. , I should never guess it in my life . So find a routine to find the weight . This routine is gradient !!!
So what is the gradient ? If the formula comes out, it is maozi :
Um. .. Is it familiar to you , If translated into Chinese, it is maozi :
At this time, the gradient can be clearly seen , It's nothing more than calculating the partial derivative of the weight to the loss function and arranging it into a vector . And the gradient has another property , That is, the gradient direction is the direction in which the function value increases fastest . How to understand this property ? Take a chestnut . If I am a person who wants to be LOL The dead fat house of the suburban King , Then there may be several factors to become the king of the suburbs , One is the depth of the hero pool , One is the overall view , There is also a Sao operation . They all have a certain weight for me to become a king . As shown in the figure , The arrow of each factor has a direction ( That is, the partial direction of factors for me to become king ) And length ( The size of the value of the partial derivative ). Then under the joint action of these factors , I will eventually train in one direction ( For example, the relationship between component force and resultant force in Physics ), At this time, I can go further to the king of the suburbs as soon as possible .
That is to say, if I keep working towards the final direction , Theoretically, I can become the king of the suburbs as soon as possible .
OK. Now we know that the direction of gradient is the fastest growing direction of function , Then I'll take a minus sign in front of the gradient ( In the opposite direction ), That's the fastest direction of function decline .( You can still make up your brain by yourself according to the picture of suburban King = =) So WOW , The essence of gradient descent is nothing more than updating the weight in the opposite direction of the gradient . Like the following figure , If I were blind , And then somehow came to a valley . Now all I have to do is go to the bottom of the valley . Because I'm blind , So I can only move little by little . If you want to move , Then I must sweep my feet around me , Where I feel more like going down the mountain, I'll go there . Then I can finally reach the bottom of the valley by repeating this cycle .
So le , Follow this routine , We can roll out the pseudo code of gradient descent :
Cycling is equivalent to walking down the mountain , In code α Pretending to force is called learning rate , In fact, it represents how big my step is when I go down the mountain . The smaller the value is, the more I want to , Take small steps , I'm afraid of falling into the pit . The higher the value, the more coquettish I am , But it's easy to pull eggs ~~
OK. This is my popular understanding of gradient descent , I hope it can help you .
边栏推荐
- 【云原生训练营】模块八 Kubernetes 生命周期管理和服务发现
- Final review points of human-computer interaction
- Mmdetection learning rate and batch_ Size relationship
- 第04章_逻辑架构
- Redis single thread problem forced sorting layman literacy
- [pytorch learning notes] datasets and dataloaders
- [pytorch learning notes] transforms
- PyTorch crop images differentiablly
- Centos7 deployment sentry redis (with architecture diagram, clear and easy to understand)
- [probably the most complete in Chinese] pushgateway entry notes
猜你喜欢

视觉上位系统设计开发(halcon-winform)

Centos7 deployment sentry redis (with architecture diagram, clear and easy to understand)

什么是one-hot encoding?Pytorch中,将label变成one hot编码的两种方式

Byte practice surface longitude

解决pushgateway数据多次推送会覆盖的问题

Matplotlib drawing label cannot display Chinese problems

el-switch 赋值后状态不变化

Dataframe returns the whole row according to the value

【Transform】【实践】使用Pytorch的torch.nn.MultiheadAttention来实现self-attention

视觉上位系统设计开发(halcon-winform)-3.图像控件
随机推荐
Kubernetes vous emmène du début à la fin
SQL server installation location cannot be changed
Yolov5 advanced nine target tracking example 1
What is label encoding? How to distinguish and use one hot encoding and label encoding?
Yolov5 advanced 8 format conversion between high and low versions
Functional modules and application scenarios covered by the productization of user portraits
Search in the two-dimensional array of leetcode sword offer (10)
redis单线程问题强制梳理门外汉扫盲
Idea does not specify an output path for the module
[transform] [NLP] first proposed transformer. The 2017 paper "attention is all you need" by Google brain team
Nppexec get process return code
Use of Tex editor
PyTorch crop images differentiablly
Detailed comments on MapReduce instance code on the official website
官网MapReduce实例代码详细批注
Explanation of time complexity and space complexity
【pytorch学习笔记】Transforms
Tensor ellipsis (three points) slice
[combinatorics] permutation and combination (set permutation, step-by-step processing example)
Didi off the shelf! Data security is national security