当前位置：网站首页>Popular understanding of gradient descent

Popular understanding of gradient descent

2022-07-03 15:16:00 【alw_ one hundred and twenty-three】

I have planned to present this series of blog posts in the form of animated interesting popular science , If you're interested Click here .

0. What is the use of gradient descent ？

In fact, gradient descent is not a machine learning algorithm , It's a search based optimization method . Because many algorithms do not have closed form solutions , Therefore, we need to find a set of parameters through iteration after iteration to minimize our loss function . The approximate routine of loss function can be seen in this figure ：
Insert picture description here
So , If we use human words to describe what gradient descent is , Namely ... I keep bathing （ seek ）, greasy （ By spectrum ） My elder martial sister （ The weight ） Where is the （ How much is the ）..

1. How to search ？

Just now we know that gradient descent is used to find weights , Then how to find the weight pinch ？ blind XX Metaphysical guess ？ impossible .. It's impossible for metaphysics to guess in this life . Just think about it , The value range of weight can be regarded as a real number space , that 100 These characteristics correspond to 100 A weight ,10000 These characteristics correspond to 10000 A weight . If you rely on blindness XX If metaphysics guesses the weight . Um. , I should never guess it in my life . So find a routine to find the weight . This routine is gradient ！！！

So what is the gradient ？ If the formula comes out, it is maozi ：
Insert picture description here
Um. .. Is it familiar to you , If translated into Chinese, it is maozi ：

At this time, the gradient can be clearly seen , It's nothing more than calculating the partial derivative of the weight to the loss function and arranging it into a vector . And the gradient has another property , That is, the gradient direction is the direction in which the function value increases fastest . How to understand this property ？ Take a chestnut . If I am a person who wants to be LOL The dead fat house of the suburban King , Then there may be several factors to become the king of the suburbs , One is the depth of the hero pool , One is the overall view , There is also a Sao operation . They all have a certain weight for me to become a king . As shown in the figure , The arrow of each factor has a direction （ That is, the partial direction of factors for me to become king ） And length （ The size of the value of the partial derivative ）. Then under the joint action of these factors , I will eventually train in one direction （ For example, the relationship between component force and resultant force in Physics ）, At this time, I can go further to the king of the suburbs as soon as possible .
Insert picture description here
That is to say, if I keep working towards the final direction , Theoretically, I can become the king of the suburbs as soon as possible .

OK. Now we know that the direction of gradient is the fastest growing direction of function , Then I'll take a minus sign in front of the gradient （ In the opposite direction ）, That's the fastest direction of function decline .（ You can still make up your brain by yourself according to the picture of suburban King = =） So WOW , The essence of gradient descent is nothing more than updating the weight in the opposite direction of the gradient . Like the following figure , If I were blind , And then somehow came to a valley . Now all I have to do is go to the bottom of the valley . Because I'm blind , So I can only move little by little . If you want to move , Then I must sweep my feet around me , Where I feel more like going down the mountain, I'll go there . Then I can finally reach the bottom of the valley by repeating this cycle .
Insert picture description here
So le , Follow this routine , We can roll out the pseudo code of gradient descent ：

Cycling is equivalent to walking down the mountain , In code α Pretending to force is called learning rate , In fact, it represents how big my step is when I go down the mountain . The smaller the value is, the more I want to , Take small steps , I'm afraid of falling into the pit . The higher the value, the more coquettish I am , But it's easy to pull eggs ~~