当前位置：网站首页>NLP introduction + practice: Chapter 3: gradient descent and back propagation

NLP introduction + practice: Chapter 3: gradient descent and back propagation

2022-07-26 01:06:00 【ZNineSun】

List of articles

1. What is the gradient ？
2. Calculation of partial derivative
- 2.1 Common derivative calculation
- 2.2 Partial derivative of multivariate function
3. Back propagation algorithm

Last one ：《nlp introduction + actual combat ： Chapter two ：pytorch To get started with 》

1. What is the gradient ？

gradient : It's a vector , derivative + The fastest changing direction ( Direction of learning parameters )

Review machine learning

collecting data x. Building machine learning model f, obtain
$f(x,w)=Y_{predict}$
In other words, a series of predicted values will be obtained after our model calculation

How to judge whether the model is good or bad :
$loss=(Y_{predict}-Y_{true})^2 -> Return to loss \\ loss=Y_{true}*log(Y_{predict}) -> Classified loss$
The goal is : Through adjustment ( Study ) Parameters w, As low as possible loss. So how do we adjust w Well ?
Insert picture description here

Choose a starting point at random W0, Certificate adjustment W0, Give Way loss The function takes the minimum value .
Insert picture description here

w How to update ：
1. Calculation w Gradient of （ derivative ）
$\Delta w=\frac{f(w+0.0000001)-f(w-0.0000001)}{2*0.0000001}$
2. to update w
$w=w-\alpha \Delta w$
among ：

△w<0, signify w Will increase
△w>0, signify w Will decrease

summary : Gradient is the change trend of multivariate function parameters （ Direction of parameter learning ), When there is only one independent variable, it is called derivative , Multivariate is partial derivative .

2. Calculation of partial derivative

2.1 Common derivative calculation

Insert picture description here

2.2 Partial derivative of multivariate function

Insert picture description here

3. Back propagation algorithm

3.1 Computational graphs and back propagation

Calculation chart : The graph of function is described by graph

Such as J(a,b,c) = 3(a + bc), Make u=a+v,v=bc, Then there are J(u)=3u

Draw it into a calculation diagram, which can be expressed as
Insert picture description here

After drawing the calculation diagram , You can see it clearly Forward calculation The process of

Then, partial derivatives are obtained for each node , Can have ：
Insert picture description here

For back propagation , Because what we ultimately want is

J Yes a Partial derivative of
J Yes b Partial derivative of
J Yes c Partial derivative of

But we are directly right a,b,c There is no way to find partial derivatives , therefore , According to the picture above , We can see that the process of back propagation is a process from right to left , The independent variables (a, b, c) Their partial derivatives are the product of the gradients on the line :
$\frac{dJ}{du}=3$
$\frac{dJ}{db}=\frac{dJ}{du}*\frac{du}{dv}*\frac{dv}{db}=3*1*c$

$\frac{dJ}{dc}=\frac{dJ}{du}*\frac{du}{dv}*\frac{dv}{dc}=3*1*b$

3.2 Back propagation in neural networks

3.2.1 Schematic diagram of neural network

w1,w2,…,wn Indicates the second of the network n Layer weight
wn[i,j] It means the first one n Layer i Neurons , Connect to page n+1 Layer j The weight of each neuron .
Insert picture description here

Such as ：w3[2,1]： Represents the weight of the second neuron in the third layer to the first neuron in the fourth layer

3.2.2 Calculation diagram of neural network

Insert picture description here

among ：

△out： It is the result of deriving the predicted value according to the loss function
f function ： It can be understood as activation function

Joining us requires △out Yes w1[1,2] Partial derivative of , You can see the following
Insert picture description here

from w1[1,2] To △out There are two paths , They are the red line and the blue line , So we just need to add the product of the two paths in the green box and multiply it by the path value outside the green box , give the result as follows ：
Insert picture description here

The formula is divided into two parts :

1. Outside the brackets : The red line on the left
2. In brackets
- 1. To the left of the plus sign : The red line on the right
- 2 To the right of the plus sign : Blue line section

But do it , When the model is big , The amount of calculation is very large

So the idea of back propagation is to find the gradient of one of the parameters , Then update , As shown in the figure below :
Insert picture description here

The calculation process is as follows ：
Insert picture description here

Update the sum of parameters to continue back propagation
Insert picture description here

The calculation process is as follows ：
Insert picture description here

Continue back propagation
Insert picture description here

The calculation process is as follows ：
Insert picture description here

The above process is the disassembly of the following formula ：
Insert picture description here

We are thinking about , We need the results calculated in the process of forward propagation when we back propagate , So we need to keep the traces of forward propagation , This is in pytorch Will be reflected .

Next ：《nlp introduction + actual combat ： Chapter four ： Use pytorch Implement linear regression manually 》

原网站

版权声明
本文为[ZNineSun]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/207/202207260059518563.html