当前位置:网站首页>Some derivation formulas for machine learning backpropagation
Some derivation formulas for machine learning backpropagation
2022-07-31 07:03:00 【im34v】
1.预备知识
The understanding of matrix derivation can refer to the derivative that we are familiar with in high school,In high school we were all taking scalar derivatives,Scalar can also be regarded as a special kind1*1的矩阵.This article is mainly to document the process of backpropagation in machine learning,So don't do too much analysis on matrix derivation(In fact, neither will I,只会简单的).
Only a matrix derivation situation that needs to be used in the back propagation process is given here:
∂ ( a T x ) ∂ x = ∂ ( a 1 x 1 + a 2 x 2 + ⋯ + a n x n ) ∂ x = [ ∂ ( a 1 x 1 + a 2 x 2 + ⋯ + a n x n ) ∂ x 1 ∂ ( a 1 x 1 + a 2 x 2 + ⋯ + a n x n ) ∂ x 2 ⋮ ∂ ( a 1 x 1 + a 2 x 2 + ⋯ + a n x n ) ∂ x n ] = [ a 1 a 2 ⋮ a n ] = a \begin{aligned} \frac{\partial\left(\boldsymbol{a}^{T} \boldsymbol{x}\right)}{\partial \boldsymbol{x}} &=\frac{\partial\left(a_{1} x_{1}+a_{2} x_{2}+\cdots+a_{n} x_{n}\right)}{\partial \boldsymbol{x}} \\ &=\left[\begin{array}{l} \frac{\partial\left(a_{1} x_{1}+a_{2} x_{2}+\cdots+a_{n} x_{n}\right)}{\partial x_{1}} \\ \frac{\partial\left(a_{1} x_{1}+a_{2} x_{2}+\cdots+a_{n} x_{n}\right)}{\partial x_{2}} \\ \vdots \\ \frac{\partial\left(a_{1} x_{1}+a_{2} x_{2}+\cdots+a_{n} x_{n}\right)}{\partial x_{n}} \end{array}\right] \\ &=\left[\begin{array}{l} a_{1} \\ a_{2} \\ \vdots \\ a_{n} \end{array}\right] \\ &=\boldsymbol a \end{aligned} ∂x∂(aTx)=∂x∂(a1x1+a2x2+⋯+anxn)=⎣⎡∂x1∂(a1x1+a2x2+⋯+anxn)∂x2∂(a1x1+a2x2+⋯+anxn)⋮∂xn∂(a1x1+a2x2+⋯+anxn)⎦⎤=⎣⎡a1a2⋮an⎦⎤=a
Once we understand this, we can start~
2.反向传播
We start propagating backwards:
隐藏层第2层:
激活函数 : d a [ 2 ] = ∂ L ∂ a [ 2 ] 激活函数: da^{[2]}=\frac{\partial L}{\partial a^{[2]}} 激活函数:da[2]=∂a[2]∂L
d z [ 2 ] = ∂ L ∂ z [ 2 ] = ∂ L ∂ a [ 2 ] ⋅ ∂ a [ 2 ] ∂ z [ 2 ] = d a [ 2 ] ⋅ g [ 2 ] ’ ( z [ 2 ] ) dz^{[2]}=\frac{\partial L}{\partial z^{[2]}}=\frac{\partial L}{\partial a^{[2]}}·\frac{\partial a^{[2]}}{\partial z^{[2]}}=da^{[2]}·g^{[2]’}(z^{[2]}) dz[2]=∂z[2]∂L=∂a[2]∂L⋅∂z[2]∂a[2]=da[2]⋅g[2]’(z[2])
d W [ 2 ] = ∂ L ∂ W [ 2 ] = ∂ L ∂ z [ 2 ] ⋅ ∂ z [ 2 ] ∂ W [ 2 ] = d z [ 2 ] ⋅ a [ 1 ] T ⇒ W [ 2 ] − = α ⋅ d W [ 2 ] dW^{[2]}=\frac{\partial L}{\partial W^{[2]}}=\frac{\partial L}{\partial z^{[2]}}·\frac{\partial z^{[2]}}{\partial W^{[2]}}=dz^{[2]}·a^{[1]T} \\ \Rightarrow W^{[2]}-=α·dW^{[2]} dW[2]=∂W[2]∂L=∂z[2]∂L⋅∂W[2]∂z[2]=dz[2]⋅a[1]T⇒W[2]−=α⋅dW[2]
d b [ 2 ] = ∂ L ∂ b [ 2 ] = ∂ L ∂ z [ 2 ] ⋅ ∂ z [ 2 ] ∂ b [ 2 ] = d z [ 2 ] ⇒ b [ 2 ] − = α ⋅ d b [ 2 ] db^{[2]}=\frac{\partial L}{\partial b^{[2]}}=\frac{\partial L}{\partial z^{[2]}}·\frac{\partial z^{[2]}}{\partial b^{[2]}}=dz^{[2]} \\ \Rightarrow b^{[2]}-=α·db^{[2]} db[2]=∂b[2]∂L=∂z[2]∂L⋅∂b[2]∂z[2]=dz[2]⇒b[2]−=α⋅db[2]
隐藏层第1层:
激活函数 : d a [ 1 ] = ∂ L ∂ a [ 1 ] = ∂ L ∂ z [ 2 ] ⋅ ∂ z [ 2 ] ∂ a [ 1 ] = W [ 2 ] T ⋅ d z [ 2 ] 激活函数: da^{[1]}=\frac{\partial L}{\partial a^{[1]}}=\frac{\partial L}{\partial z^{[2]}}·\frac{\partial z^{[2]}}{\partial a^{[1]}}=W^{[2]T}·dz^{[2]} 激活函数:da[1]=∂a[1]∂L=∂z[2]∂L⋅∂a[1]∂z[2]=W[2]T⋅dz[2]
说实话,I don't understand the calculation result of this step:
∂ L ∂ z [ 2 ] \frac{\partial L}{\partial z^{[2]}} ∂z[2]∂L 是 d z [ 2 ] dz^{[2]} dz[2], ∂ z [ 2 ] ∂ a [ 1 ] \frac{\partial z^{[2]}}{\partial a^{[1]}} ∂a[1]∂z[2] 是 W [ 2 ] T W^{[2]T} W[2]T,Why is the result of multiplying W [ 2 ] T ⋅ d z [ 2 ] W^{[2]T}·dz^{[2]} W[2]T⋅dz[2],而不是 d z [ 2 ] ⋅ W [ 2 ] T dz^{[2]}·W^{[2]T} dz[2]⋅W[2]T.
d z [ 1 ] = ∂ L ∂ z [ 1 ] = ∂ L ∂ a [ 1 ] ⋅ ∂ a [ 1 ] ∂ z [ 1 ] = d a [ 1 ] ⋅ g [ 1 ] ’ ( z [ 1 ] ) dz^{[1]}=\frac{\partial L}{\partial z^{[1]}}=\frac{\partial L}{\partial a^{[1]}}·\frac{\partial a^{[1]}}{\partial z^{[1]}}=da^{[1]}·g^{[1]’}(z^{[1]}) dz[1]=∂z[1]∂L=∂a[1]∂L⋅∂z[1]∂a[1]=da[1]⋅g[1]’(z[1])
d W [ 1 ] = ∂ L ∂ W [ 1 ] = ∂ L ∂ z [ 1 ] ⋅ ∂ z [ 1 ] ∂ W [ 1 ] = d z [ 1 ] ⋅ a [ 0 ] T ⇒ W [ 1 ] − = α ⋅ d W [ 1 ] dW^{[1]}=\frac{\partial L}{\partial W^{[1]}}=\frac{\partial L}{\partial z^{[1]}}·\frac{\partial z^{[1]}}{\partial W^{[1]}}=dz^{[1]}·a^{[0]T} \\ \Rightarrow W^{[1]}-=α·dW^{[1]} dW[1]=∂W[1]∂L=∂z[1]∂L⋅∂W[1]∂z[1]=dz[1]⋅a[0]T⇒W[1]−=α⋅dW[1]
d b [ 1 ] = ∂ L ∂ b [ 1 ] = ∂ L ∂ z [ 1 ] ⋅ ∂ z [ 1 ] ∂ b [ 1 ] = d z [ 1 ] ⇒ b [ 1 ] − = α ⋅ d b [ 1 ] db^{[1]}=\frac{\partial L}{\partial b^{[1]}}=\frac{\partial L}{\partial z^{[1]}}·\frac{\partial z^{[1]}}{\partial b^{[1]}}=dz^{[1]} \\ \Rightarrow b^{[1]}-=α·db^{[1]} db[1]=∂b[1]∂L=∂z[1]∂L⋅∂b[1]∂z[1]=dz[1]⇒b[1]−=α⋅db[1]
3.总结
第l层:
激活函数 : d a [ l ] = ∂ L ∂ a [ l ] = ∂ L ∂ z [ l + 1 ] ⋅ ∂ z [ l + 1 ] ∂ a [ l ] = W [ l + 1 ] T ⋅ d z [ l + 1 ] 激活函数: da^{[l]}=\frac{\partial L}{\partial a^{[l]}}=\frac{\partial L}{\partial z^{[l+1]}}·\frac{\partial z^{[l+1]}}{\partial a^{[l]}}=W^{[l+1]T}·dz^{[l+1]} 激活函数:da[l]=∂a[l]∂L=∂z[l+1]∂L⋅∂a[l]∂z[l+1]=W[l+1]T⋅dz[l+1]
d z [ l ] = ∂ L ∂ z [ l ] = ∂ L ∂ a [ l ] ⋅ ∂ a [ l ] ∂ z [ l ] = d a [ l ] ⋅ g [ l ] ’ ( z [ l ] ) ⇒ d z [ l ] = W [ l + 1 ] T d z [ l + 1 ] ⋅ g [ l ] ’ ( z [ l ] ) dz^{[l]}=\frac{\partial L}{\partial z^{[l]}}=\frac{\partial L}{\partial a^{[l]}}·\frac{\partial a^{[l]}}{\partial z^{[l]}}=da^{[l]}·g^{[l]’}(z^{[l]}) \\ \Rightarrow dz^{[l]}=W^{[l+1]T}dz^{[l+1]}·g^{[l]’}(z^{[l]}) dz[l]=∂z[l]∂L=∂a[l]∂L⋅∂z[l]∂a[l]=da[l]⋅g[l]’(z[l])⇒dz[l]=W[l+1]Tdz[l+1]⋅g[l]’(z[l])
d W [ l ] = ∂ L ∂ W [ l ] = ∂ L ∂ z [ l ] ⋅ ∂ z [ l ] ∂ W [ l ] = d z [ l ] ⋅ a [ l − 1 ] T ⇒ W [ l ] − = α ⋅ d W [ l ] dW^{[l]}=\frac{\partial L}{\partial W^{[l]}}=\frac{\partial L}{\partial z^{[l]}}·\frac{\partial z^{[l]}}{\partial W^{[l]}}=dz^{[l]}·a^{[l-1]T} \\ \Rightarrow W^{[l]}-=α·dW^{[l]} dW[l]=∂W[l]∂L=∂z[l]∂L⋅∂W[l]∂z[l]=dz[l]⋅a[l−1]T⇒W[l]−=α⋅dW[l]
d b [ l ] = ∂ L ∂ b [ l ] = ∂ L ∂ z [ l ] ⋅ ∂ z [ l ] ∂ b [ l ] = d z [ l ] ⇒ b [ l ] − = α ⋅ d b [ l ] db^{[l]}=\frac{\partial L}{\partial b^{[l]}}=\frac{\partial L}{\partial z^{[l]}}·\frac{\partial z^{[l]}}{\partial b^{[l]}}=dz^{[l]} \\ \Rightarrow b^{[l]}-=α·db^{[l]} db[l]=∂b[l]∂L=∂z[l]∂L⋅∂b[l]∂z[l]=dz[l]⇒b[l]−=α⋅db[l]
边栏推荐
猜你喜欢
随机推荐
群晖NAS配置阿里云盘同步
选择排序法
FTP服务与配置
树状数组(单点修改区间查询和区间修改单点查询)
浅层了解欧拉函数
三本毕业,中途转行软件测试,顶着这些光环从月薪7k干到20k+,感觉还不错
Install and use uView
Basic usage of Koa framework
性能测试概述
client
项目-log4j2排查问题
OSI七层模型
哪吒监控安装脚本
【博学谷学习记录】超强总结,用心分享 | 软件测试 抓包
Debian 10 iptables (防火墙)配置
进程和计划任务管理
TypeScript进阶
Oracle入门 04 - Vmware虚拟机安装配置
DHCP原理与配置
TCP/IP协议和互联网协议群