当前位置:网站首页>Popular understanding of gradient descent
Popular understanding of gradient descent
2022-07-03 15:16:00 【alw_ one hundred and twenty-three】
I have planned to present this series of blog posts in the form of animated interesting popular science , If you're interested Click here .
0. What is the use of gradient descent ?
In fact, gradient descent is not a machine learning algorithm , It's a search based optimization method . Because many algorithms do not have closed form solutions , Therefore, we need to find a set of parameters through iteration after iteration to minimize our loss function . The approximate routine of loss function can be seen in this figure :
So , If we use human words to describe what gradient descent is , Namely ... I keep bathing ( seek ), greasy ( By spectrum ) My elder martial sister ( The weight ) Where is the ( How much is the )..
1. How to search ?
Just now we know that gradient descent is used to find weights , Then how to find the weight pinch ? blind XX Metaphysical guess ? impossible .. It's impossible for metaphysics to guess in this life . Just think about it , The value range of weight can be regarded as a real number space , that 100 These characteristics correspond to 100 A weight ,10000 These characteristics correspond to 10000 A weight . If you rely on blindness XX If metaphysics guesses the weight . Um. , I should never guess it in my life . So find a routine to find the weight . This routine is gradient !!!
So what is the gradient ? If the formula comes out, it is maozi :
Um. .. Is it familiar to you , If translated into Chinese, it is maozi :
At this time, the gradient can be clearly seen , It's nothing more than calculating the partial derivative of the weight to the loss function and arranging it into a vector . And the gradient has another property , That is, the gradient direction is the direction in which the function value increases fastest . How to understand this property ? Take a chestnut . If I am a person who wants to be LOL The dead fat house of the suburban King , Then there may be several factors to become the king of the suburbs , One is the depth of the hero pool , One is the overall view , There is also a Sao operation . They all have a certain weight for me to become a king . As shown in the figure , The arrow of each factor has a direction ( That is, the partial direction of factors for me to become king ) And length ( The size of the value of the partial derivative ). Then under the joint action of these factors , I will eventually train in one direction ( For example, the relationship between component force and resultant force in Physics ), At this time, I can go further to the king of the suburbs as soon as possible .
That is to say, if I keep working towards the final direction , Theoretically, I can become the king of the suburbs as soon as possible .
OK. Now we know that the direction of gradient is the fastest growing direction of function , Then I'll take a minus sign in front of the gradient ( In the opposite direction ), That's the fastest direction of function decline .( You can still make up your brain by yourself according to the picture of suburban King = =) So WOW , The essence of gradient descent is nothing more than updating the weight in the opposite direction of the gradient . Like the following figure , If I were blind , And then somehow came to a valley . Now all I have to do is go to the bottom of the valley . Because I'm blind , So I can only move little by little . If you want to move , Then I must sweep my feet around me , Where I feel more like going down the mountain, I'll go there . Then I can finally reach the bottom of the valley by repeating this cycle .
So le , Follow this routine , We can roll out the pseudo code of gradient descent :
Cycling is equivalent to walking down the mountain , In code α Pretending to force is called learning rate , In fact, it represents how big my step is when I go down the mountain . The smaller the value is, the more I want to , Take small steps , I'm afraid of falling into the pit . The higher the value, the more coquettish I am , But it's easy to pull eggs ~~
OK. This is my popular understanding of gradient descent , I hope it can help you .
边栏推荐
- Redis主从、哨兵、集群模式介绍
- [attention mechanism] [first vit] Detr, end to end object detection with transformers the main components of the network are CNN and transformer
- qt使用QZxing生成二维码
- Jvm-02-class loading subsystem
- Yolov5系列(一)——网络可视化工具netron
- The method of parameter estimation of user-defined function in MATLAB
- [set theory] inclusion exclusion principle (complex example)
- XWiki安装使用技巧
- Tencent internship interview sorting
- 北京共有产权房出租新规实施的租赁案例
猜你喜欢
Centos7 deployment sentry redis (with architecture diagram, clear and easy to understand)
Solve the problem that pushgateway data will be overwritten by multiple push
视觉上位系统设计开发(halcon-winform)
Jvm-08-garbage collector
[transform] [practice] use pytoch's torch nn. Multiheadattention to realize self attention
Summary of concurrent full knowledge points
mysql innodb 存储引擎的特性—行锁剖析
【注意力机制】【首篇ViT】DETR,End-to-End Object Detection with Transformers网络的主要组成是CNN和Transformer
[wechat applet] wxss template style
Kubernetes 进阶训练营 Pod基础
随机推荐
解决pushgateway数据多次推送会覆盖的问题
Yolov5系列(一)——網絡可視化工具netron
Reentrantlock usage and source code analysis
App全局异常捕获
官网MapReduce实例代码详细批注
Besides lying flat, what else can a 27 year old do in life?
视觉上位系统设计开发(halcon-winform)-1.流程节点设计
Composite type (custom type)
The first character of leetcode sword offer that only appears once (12)
视觉上位系统设计开发(halcon-winform)-6.节点与宫格
基于SVN分支开发模式流程浅析
Influxdb2 sources add data sources
Introduction to redis master-slave, sentinel and cluster mode
Yolov5 advanced nine target tracking example 1
【Transformer】入门篇-哈佛Harvard NLP的原作者在2018年初以逐行实现的形式呈现了论文The Annotated Transformer
百度智能云助力石嘴山市升级“互联网+养老服务”智慧康养新模式
[wechat applet] wxss template style
Jvm-04-runtime data area heap, method area
[transform] [NLP] first proposed transformer. The 2017 paper "attention is all you need" by Google brain team
使用JMeter对WebService进行压力测试