当前位置:网站首页>Popular understanding of gradient descent
Popular understanding of gradient descent
2022-07-03 15:16:00 【alw_ one hundred and twenty-three】
I have planned to present this series of blog posts in the form of animated interesting popular science , If you're interested Click here .
0. What is the use of gradient descent ?
In fact, gradient descent is not a machine learning algorithm , It's a search based optimization method . Because many algorithms do not have closed form solutions , Therefore, we need to find a set of parameters through iteration after iteration to minimize our loss function . The approximate routine of loss function can be seen in this figure :
So , If we use human words to describe what gradient descent is , Namely ... I keep bathing ( seek ), greasy ( By spectrum ) My elder martial sister ( The weight ) Where is the ( How much is the )..
1. How to search ?
Just now we know that gradient descent is used to find weights , Then how to find the weight pinch ? blind XX Metaphysical guess ? impossible .. It's impossible for metaphysics to guess in this life . Just think about it , The value range of weight can be regarded as a real number space , that 100 These characteristics correspond to 100 A weight ,10000 These characteristics correspond to 10000 A weight . If you rely on blindness XX If metaphysics guesses the weight . Um. , I should never guess it in my life . So find a routine to find the weight . This routine is gradient !!!
So what is the gradient ? If the formula comes out, it is maozi :
Um. .. Is it familiar to you , If translated into Chinese, it is maozi :
At this time, the gradient can be clearly seen , It's nothing more than calculating the partial derivative of the weight to the loss function and arranging it into a vector . And the gradient has another property , That is, the gradient direction is the direction in which the function value increases fastest . How to understand this property ? Take a chestnut . If I am a person who wants to be LOL The dead fat house of the suburban King , Then there may be several factors to become the king of the suburbs , One is the depth of the hero pool , One is the overall view , There is also a Sao operation . They all have a certain weight for me to become a king . As shown in the figure , The arrow of each factor has a direction ( That is, the partial direction of factors for me to become king ) And length ( The size of the value of the partial derivative ). Then under the joint action of these factors , I will eventually train in one direction ( For example, the relationship between component force and resultant force in Physics ), At this time, I can go further to the king of the suburbs as soon as possible .
That is to say, if I keep working towards the final direction , Theoretically, I can become the king of the suburbs as soon as possible .
OK. Now we know that the direction of gradient is the fastest growing direction of function , Then I'll take a minus sign in front of the gradient ( In the opposite direction ), That's the fastest direction of function decline .( You can still make up your brain by yourself according to the picture of suburban King = =) So WOW , The essence of gradient descent is nothing more than updating the weight in the opposite direction of the gradient . Like the following figure , If I were blind , And then somehow came to a valley . Now all I have to do is go to the bottom of the valley . Because I'm blind , So I can only move little by little . If you want to move , Then I must sweep my feet around me , Where I feel more like going down the mountain, I'll go there . Then I can finally reach the bottom of the valley by repeating this cycle .
So le , Follow this routine , We can roll out the pseudo code of gradient descent :
Cycling is equivalent to walking down the mountain , In code α Pretending to force is called learning rate , In fact, it represents how big my step is when I go down the mountain . The smaller the value is, the more I want to , Take small steps , I'm afraid of falling into the pit . The higher the value, the more coquettish I am , But it's easy to pull eggs ~~
OK. This is my popular understanding of gradient descent , I hope it can help you .
边栏推荐
- Global and Chinese market of trimethylamine 2022-2028: Research Report on technology, participants, trends, market size and share
- socket. IO build distributed web push server
- The first character of leetcode sword offer that only appears once (12)
- Global and Chinese markets for infrared solutions (for industrial, civil, national defense and security applications) 2022-2028: Research Report on technology, participants, trends, market size and sh
- qt使用QZxing生成二维码
- Kubernetes帶你從頭到尾捋一遍
- 什么是embedding(把物体编码为一个低维稠密向量),pytorch中nn.Embedding原理及使用
- Yolov5系列(一)——网络可视化工具netron
- 什么是one-hot encoding?Pytorch中,将label变成one hot编码的两种方式
- 406. Reconstruct the queue according to height
猜你喜欢
【pytorch学习笔记】Datasets and Dataloaders
Basic SQL tutorial
Jvm-08-garbage collector
What is embedding (encoding an object into a low dimensional dense vector), NN in pytorch Principle and application of embedding
【注意力机制】【首篇ViT】DETR,End-to-End Object Detection with Transformers网络的主要组成是CNN和Transformer
Solve the problem that pushgateway data will be overwritten by multiple push
Jvm-09 byte code introduction
Redis lock Optimization Practice issued by gaobingfa
Kubernetes advanced training camp pod Foundation
Yolov5系列(一)——網絡可視化工具netron
随机推荐
Jvm-08-garbage collector
Tensor ellipsis (three points) slice
第04章_逻辑架构
Kubernetes帶你從頭到尾捋一遍
Introduction, use and principle of synchronized
Web server code parsing - thread pool
【日常训练】395. 至少有 K 个重复字符的最长子串
开启 Chrome 和 Edge 浏览器多线程下载
Kubernetes advanced training camp pod Foundation
Mysql报错:[ERROR] mysqld: File ‘./mysql-bin.010228‘ not found (Errcode: 2 “No such file or directory“)
Puppet自动化运维排错案例
Global and Chinese market of transfer case 2022-2028: Research Report on technology, participants, trends, market size and share
Halcon与Winform学习第一节
Relationship between truncated random distribution and original distribution
Composite type (custom type)
Basic SQL tutorial
Analysis of development mode process based on SVN branch
Solve the problem that pushgateway data will be overwritten by multiple push
Construction of operation and maintenance system
视觉上位系统设计开发(halcon-winform)-3.图像控件