当前位置:网站首页>Popular understanding of gradient descent
Popular understanding of gradient descent
2022-07-03 15:16:00 【alw_ one hundred and twenty-three】
I have planned to present this series of blog posts in the form of animated interesting popular science , If you're interested Click here .
0. What is the use of gradient descent ?
In fact, gradient descent is not a machine learning algorithm , It's a search based optimization method . Because many algorithms do not have closed form solutions , Therefore, we need to find a set of parameters through iteration after iteration to minimize our loss function . The approximate routine of loss function can be seen in this figure :
So , If we use human words to describe what gradient descent is , Namely ... I keep bathing ( seek ), greasy ( By spectrum ) My elder martial sister ( The weight ) Where is the ( How much is the )..
1. How to search ?
Just now we know that gradient descent is used to find weights , Then how to find the weight pinch ? blind XX Metaphysical guess ? impossible .. It's impossible for metaphysics to guess in this life . Just think about it , The value range of weight can be regarded as a real number space , that 100 These characteristics correspond to 100 A weight ,10000 These characteristics correspond to 10000 A weight . If you rely on blindness XX If metaphysics guesses the weight . Um. , I should never guess it in my life . So find a routine to find the weight . This routine is gradient !!!
So what is the gradient ? If the formula comes out, it is maozi :
Um. .. Is it familiar to you , If translated into Chinese, it is maozi :
At this time, the gradient can be clearly seen , It's nothing more than calculating the partial derivative of the weight to the loss function and arranging it into a vector . And the gradient has another property , That is, the gradient direction is the direction in which the function value increases fastest . How to understand this property ? Take a chestnut . If I am a person who wants to be LOL The dead fat house of the suburban King , Then there may be several factors to become the king of the suburbs , One is the depth of the hero pool , One is the overall view , There is also a Sao operation . They all have a certain weight for me to become a king . As shown in the figure , The arrow of each factor has a direction ( That is, the partial direction of factors for me to become king ) And length ( The size of the value of the partial derivative ). Then under the joint action of these factors , I will eventually train in one direction ( For example, the relationship between component force and resultant force in Physics ), At this time, I can go further to the king of the suburbs as soon as possible .
That is to say, if I keep working towards the final direction , Theoretically, I can become the king of the suburbs as soon as possible .
OK. Now we know that the direction of gradient is the fastest growing direction of function , Then I'll take a minus sign in front of the gradient ( In the opposite direction ), That's the fastest direction of function decline .( You can still make up your brain by yourself according to the picture of suburban King = =) So WOW , The essence of gradient descent is nothing more than updating the weight in the opposite direction of the gradient . Like the following figure , If I were blind , And then somehow came to a valley . Now all I have to do is go to the bottom of the valley . Because I'm blind , So I can only move little by little . If you want to move , Then I must sweep my feet around me , Where I feel more like going down the mountain, I'll go there . Then I can finally reach the bottom of the valley by repeating this cycle .
So le , Follow this routine , We can roll out the pseudo code of gradient descent :
Cycling is equivalent to walking down the mountain , In code α Pretending to force is called learning rate , In fact, it represents how big my step is when I go down the mountain . The smaller the value is, the more I want to , Take small steps , I'm afraid of falling into the pit . The higher the value, the more coquettish I am , But it's easy to pull eggs ~~
OK. This is my popular understanding of gradient descent , I hope it can help you .
边栏推荐
- Kubernetes帶你從頭到尾捋一遍
- Mmdetection learning rate and batch_ Size relationship
- Use of Tex editor
- Using multipleoutputs to output multiple files in MapReduce
- Redis cache penetration, cache breakdown, cache avalanche solution
- Can‘t connect to MySQL server on ‘localhost‘
- The first character of leetcode sword offer that only appears once (12)
- 运维体系的构建
- Besides lying flat, what else can a 27 year old do in life?
- 【pytorch学习笔记】Transforms
猜你喜欢

Tencent internship interview sorting

Leasing cases of the implementation of the new regulations on the rental of jointly owned houses in Beijing

Functional modules and application scenarios covered by the productization of user portraits

What is embedding (encoding an object into a low dimensional dense vector), NN in pytorch Principle and application of embedding
![MySQL reports an error: [error] mysqld: file '/ mysql-bin. 010228‘ not found (Errcode: 2 “No such file or directory“)](/img/cd/2e4f5884d034ff704809f476bda288.png)
MySQL reports an error: [error] mysqld: file '/ mysql-bin. 010228‘ not found (Errcode: 2 “No such file or directory“)

Jvm-04-runtime data area heap, method area

Yolov5 series (I) -- network visualization tool netron

Kubernetes 进阶训练营 Pod基础

Centos7 deployment sentry redis (with architecture diagram, clear and easy to understand)

基础SQL教程
随机推荐
Characteristics of MySQL InnoDB storage engine -- Analysis of row lock
Yolov5 series (I) -- network visualization tool netron
Concurrency-01-create thread, sleep, yield, wait, join, interrupt, thread state, synchronized, park, reentrantlock
什么是one-hot encoding?Pytorch中,将label变成one hot编码的两种方式
Didi off the shelf! Data security is national security
高并发下之redis锁优化实战
Using multipleoutputs to output multiple files in MapReduce
Apache ant extension tutorial
使用Tengine解决负载均衡的Session问题
Unity hierarchical bounding box AABB tree
Global and Chinese markets for sterile packaging 2022-2028: Research Report on technology, participants, trends, market size and share
【Transform】【实践】使用Pytorch的torch.nn.MultiheadAttention来实现self-attention
TPS61170QDRVRQ1
PyTorch crop images differentiablly
Using notepad++ to build an arbitrary language development environment
Global and Chinese market of Bus HVAC systems 2022-2028: Research Report on technology, participants, trends, market size and share
Matplotlib drawing label cannot display Chinese problems
Leetcode the smallest number of the rotation array of the offer of the sword (11)
The method of parameter estimation of user-defined function in MATLAB
Global and Chinese market of transfer case 2022-2028: Research Report on technology, participants, trends, market size and share