当前位置：网站首页>Li Hongyi machine learning (2017 Edition)_ P14: back propagation

Li Hongyi machine learning (2017 Edition)_ P14: back propagation

2022-07-27 01:12:00 【Although Beihai is on credit, Fuyao can take it】

Insert picture description here

Related information

Open source content ：https://linklearner.com/datawhale-homepage/index.html#/learn/detail/13

Open source content ：https://github.com/datawhalechina/leeml-notes

Open source content ：https://gitee.com/datawhalechina/leeml-notes

Video address ：https://www.bilibili.com/video/BV1Ht411g7Ef

Official address ：http://speech.ee.ntu.edu.tw/~tlkagk/courses.html

Reference note address ：https://datawhalechina.github.io/leeml-notes/#/chapter14/chapter14

1、 gradient descent Gradient Descent

Insert picture description here

Give to the $\theta$ (weight and bias)
First choose an initial $\theta^0$ , Calculation $\theta^0$ Loss function of （Loss Function） Let's have a partial differential of a parameter
After calculating this vector （vector） Partial differential , Then you can update $\theta$
Million level parameters （millions of parameters）

Back propagation （Backpropagation） Is a more efficient algorithm , Calculate the gradient （Gradient） Vector （Vector） when , It can be calculated efficiently

2、 The chain rule （ One yuan and multiple ）

Insert picture description here
Chain rule of multivariate function , You need to do the chain rule to calculate the partial derivative and sum of each inner function , Pictured above Case 2.

3、 Back propagation

3.1、 Loss function calculation

neural network （ Model ） The structure is as follows ： Calculation $y_1$ , $y_2$ For parameters $w_1$ , $w_2$ Partial derivative of .
Insert picture description here
The loss function is the sum of each single data loss function ：

3.2、 gradient （ Partial Guide ） Calculation

Adopt the chain rule , Perform parameter separation ：
$\frac{\partial l}{\partial w}= \frac{\partial z}{\partial w}\frac{\partial l}{\partial z}$
among $\frac{\partial z}{\partial w}$ For forward propagation , The result is input data $x$ ;
Insert picture description here
$\frac{\partial l}{\partial z}$ For backward propagation , It is necessary to divide different parameters for calculation ：
Take out a Neuron Analyze ：

Introduce activation function $a$ , At the same time, identify the subsequent neurons $z^{\prime}$ , $z^{\prime \prime}$ Perform chain rule derivation ：
$\frac{\partial l}{\partial z}= \frac{\partial a}{\partial z}\frac{\partial l}{\partial a}\Rightarrow \sigma ^{\prime}(z)\frac{\partial l}{\partial a}= \frac{\partial z^{\prime}}{\partial a}\frac{\partial l}{\partial z^{\prime}}+ \frac{\partial z^{\prime \prime}}{\partial a}\frac{\partial l}{\partial z^{\prime \prime}}$
Insert picture description here
Mark the above formula in bold in the structural drawing , as follows ：

Will find , Look at this matter from another angle , Now there's another neuron , hold forward The process is reversed , among ${\sigma}'(z)$ Is constant , Because it has been determined when it propagates forward .

3.3、 Discuss on the output layer

3.3.1、 The following is Output layer

hypothesis $\frac{\partial l}{\partial z'}$ and $\frac{\partial l}{\partial z''}$ It's the last hidden layer , That is to say y1 And y2 Is the output value , Then we can calculate directly $\frac{\partial l}{\partial z}$ result ：
Insert picture description here

3.3.2、 Not later Output layer（ That is, the hidden layer ）

Insert picture description here

In this case , Continue to calculate the following green $\frac{\partial l}{\partial z_a}$ and $\frac{\partial l}{\partial z_b}$ , Then by continuing to multiply $w_5$ and $w_6$ obtain $\frac{\partial l}{\partial z'}$ , But if $\frac{\partial l}{\partial z_a}$ and $\frac{\partial l}{\partial z_b}$ I don't know , Then we will continue to calculate the surface layer later , Until you encounter the output value , Get the output value and then go in the opposite direction .
Insert picture description here
Actually backward pass The calculation amount of time and forward propagation is about the same .

4、 summary

$\frac{\partial l}{\partial w}= \frac{\partial z}{\partial w}\frac{\partial l}{\partial z}$
Our goal is to require calculation $\frac{\partial z}{\partial w}$ （Forward pass Part of ） And calculation $\frac{\partial l}{\partial z}$ ( Backward pass Part of ), And then put $\frac{\partial z}{\partial w}$ and $\frac{\partial l}{\partial z}$ Multiply , You can get all the parameters in the neural network , And then you can use gradient descent to keep updating , Get the function with the least loss .
Insert picture description here