当前位置:网站首页>Some derivation formulas for machine learning backpropagation
Some derivation formulas for machine learning backpropagation
2022-07-31 07:03:00 【im34v】
1.预备知识
The understanding of matrix derivation can refer to the derivative that we are familiar with in high school,In high school we were all taking scalar derivatives,Scalar can also be regarded as a special kind1*1的矩阵.This article is mainly to document the process of backpropagation in machine learning,So don't do too much analysis on matrix derivation(In fact, neither will I,只会简单的).
Only a matrix derivation situation that needs to be used in the back propagation process is given here:
∂ ( a T x ) ∂ x = ∂ ( a 1 x 1 + a 2 x 2 + ⋯ + a n x n ) ∂ x = [ ∂ ( a 1 x 1 + a 2 x 2 + ⋯ + a n x n ) ∂ x 1 ∂ ( a 1 x 1 + a 2 x 2 + ⋯ + a n x n ) ∂ x 2 ⋮ ∂ ( a 1 x 1 + a 2 x 2 + ⋯ + a n x n ) ∂ x n ] = [ a 1 a 2 ⋮ a n ] = a \begin{aligned} \frac{\partial\left(\boldsymbol{a}^{T} \boldsymbol{x}\right)}{\partial \boldsymbol{x}} &=\frac{\partial\left(a_{1} x_{1}+a_{2} x_{2}+\cdots+a_{n} x_{n}\right)}{\partial \boldsymbol{x}} \\ &=\left[\begin{array}{l} \frac{\partial\left(a_{1} x_{1}+a_{2} x_{2}+\cdots+a_{n} x_{n}\right)}{\partial x_{1}} \\ \frac{\partial\left(a_{1} x_{1}+a_{2} x_{2}+\cdots+a_{n} x_{n}\right)}{\partial x_{2}} \\ \vdots \\ \frac{\partial\left(a_{1} x_{1}+a_{2} x_{2}+\cdots+a_{n} x_{n}\right)}{\partial x_{n}} \end{array}\right] \\ &=\left[\begin{array}{l} a_{1} \\ a_{2} \\ \vdots \\ a_{n} \end{array}\right] \\ &=\boldsymbol a \end{aligned} ∂x∂(aTx)=∂x∂(a1x1+a2x2+⋯+anxn)=⎣⎡∂x1∂(a1x1+a2x2+⋯+anxn)∂x2∂(a1x1+a2x2+⋯+anxn)⋮∂xn∂(a1x1+a2x2+⋯+anxn)⎦⎤=⎣⎡a1a2⋮an⎦⎤=a
Once we understand this, we can start~
2.反向传播

We start propagating backwards:
隐藏层第2层:
激活函数 : d a [ 2 ] = ∂ L ∂ a [ 2 ] 激活函数: da^{[2]}=\frac{\partial L}{\partial a^{[2]}} 激活函数:da[2]=∂a[2]∂L
d z [ 2 ] = ∂ L ∂ z [ 2 ] = ∂ L ∂ a [ 2 ] ⋅ ∂ a [ 2 ] ∂ z [ 2 ] = d a [ 2 ] ⋅ g [ 2 ] ’ ( z [ 2 ] ) dz^{[2]}=\frac{\partial L}{\partial z^{[2]}}=\frac{\partial L}{\partial a^{[2]}}·\frac{\partial a^{[2]}}{\partial z^{[2]}}=da^{[2]}·g^{[2]’}(z^{[2]}) dz[2]=∂z[2]∂L=∂a[2]∂L⋅∂z[2]∂a[2]=da[2]⋅g[2]’(z[2])
d W [ 2 ] = ∂ L ∂ W [ 2 ] = ∂ L ∂ z [ 2 ] ⋅ ∂ z [ 2 ] ∂ W [ 2 ] = d z [ 2 ] ⋅ a [ 1 ] T ⇒ W [ 2 ] − = α ⋅ d W [ 2 ] dW^{[2]}=\frac{\partial L}{\partial W^{[2]}}=\frac{\partial L}{\partial z^{[2]}}·\frac{\partial z^{[2]}}{\partial W^{[2]}}=dz^{[2]}·a^{[1]T} \\ \Rightarrow W^{[2]}-=α·dW^{[2]} dW[2]=∂W[2]∂L=∂z[2]∂L⋅∂W[2]∂z[2]=dz[2]⋅a[1]T⇒W[2]−=α⋅dW[2]
d b [ 2 ] = ∂ L ∂ b [ 2 ] = ∂ L ∂ z [ 2 ] ⋅ ∂ z [ 2 ] ∂ b [ 2 ] = d z [ 2 ] ⇒ b [ 2 ] − = α ⋅ d b [ 2 ] db^{[2]}=\frac{\partial L}{\partial b^{[2]}}=\frac{\partial L}{\partial z^{[2]}}·\frac{\partial z^{[2]}}{\partial b^{[2]}}=dz^{[2]} \\ \Rightarrow b^{[2]}-=α·db^{[2]} db[2]=∂b[2]∂L=∂z[2]∂L⋅∂b[2]∂z[2]=dz[2]⇒b[2]−=α⋅db[2]
隐藏层第1层:
激活函数 : d a [ 1 ] = ∂ L ∂ a [ 1 ] = ∂ L ∂ z [ 2 ] ⋅ ∂ z [ 2 ] ∂ a [ 1 ] = W [ 2 ] T ⋅ d z [ 2 ] 激活函数: da^{[1]}=\frac{\partial L}{\partial a^{[1]}}=\frac{\partial L}{\partial z^{[2]}}·\frac{\partial z^{[2]}}{\partial a^{[1]}}=W^{[2]T}·dz^{[2]} 激活函数:da[1]=∂a[1]∂L=∂z[2]∂L⋅∂a[1]∂z[2]=W[2]T⋅dz[2]
说实话,I don't understand the calculation result of this step:
∂ L ∂ z [ 2 ] \frac{\partial L}{\partial z^{[2]}} ∂z[2]∂L 是 d z [ 2 ] dz^{[2]} dz[2], ∂ z [ 2 ] ∂ a [ 1 ] \frac{\partial z^{[2]}}{\partial a^{[1]}} ∂a[1]∂z[2] 是 W [ 2 ] T W^{[2]T} W[2]T,Why is the result of multiplying W [ 2 ] T ⋅ d z [ 2 ] W^{[2]T}·dz^{[2]} W[2]T⋅dz[2],而不是 d z [ 2 ] ⋅ W [ 2 ] T dz^{[2]}·W^{[2]T} dz[2]⋅W[2]T.
d z [ 1 ] = ∂ L ∂ z [ 1 ] = ∂ L ∂ a [ 1 ] ⋅ ∂ a [ 1 ] ∂ z [ 1 ] = d a [ 1 ] ⋅ g [ 1 ] ’ ( z [ 1 ] ) dz^{[1]}=\frac{\partial L}{\partial z^{[1]}}=\frac{\partial L}{\partial a^{[1]}}·\frac{\partial a^{[1]}}{\partial z^{[1]}}=da^{[1]}·g^{[1]’}(z^{[1]}) dz[1]=∂z[1]∂L=∂a[1]∂L⋅∂z[1]∂a[1]=da[1]⋅g[1]’(z[1])
d W [ 1 ] = ∂ L ∂ W [ 1 ] = ∂ L ∂ z [ 1 ] ⋅ ∂ z [ 1 ] ∂ W [ 1 ] = d z [ 1 ] ⋅ a [ 0 ] T ⇒ W [ 1 ] − = α ⋅ d W [ 1 ] dW^{[1]}=\frac{\partial L}{\partial W^{[1]}}=\frac{\partial L}{\partial z^{[1]}}·\frac{\partial z^{[1]}}{\partial W^{[1]}}=dz^{[1]}·a^{[0]T} \\ \Rightarrow W^{[1]}-=α·dW^{[1]} dW[1]=∂W[1]∂L=∂z[1]∂L⋅∂W[1]∂z[1]=dz[1]⋅a[0]T⇒W[1]−=α⋅dW[1]
d b [ 1 ] = ∂ L ∂ b [ 1 ] = ∂ L ∂ z [ 1 ] ⋅ ∂ z [ 1 ] ∂ b [ 1 ] = d z [ 1 ] ⇒ b [ 1 ] − = α ⋅ d b [ 1 ] db^{[1]}=\frac{\partial L}{\partial b^{[1]}}=\frac{\partial L}{\partial z^{[1]}}·\frac{\partial z^{[1]}}{\partial b^{[1]}}=dz^{[1]} \\ \Rightarrow b^{[1]}-=α·db^{[1]} db[1]=∂b[1]∂L=∂z[1]∂L⋅∂b[1]∂z[1]=dz[1]⇒b[1]−=α⋅db[1]
3.总结
第l层:
激活函数 : d a [ l ] = ∂ L ∂ a [ l ] = ∂ L ∂ z [ l + 1 ] ⋅ ∂ z [ l + 1 ] ∂ a [ l ] = W [ l + 1 ] T ⋅ d z [ l + 1 ] 激活函数: da^{[l]}=\frac{\partial L}{\partial a^{[l]}}=\frac{\partial L}{\partial z^{[l+1]}}·\frac{\partial z^{[l+1]}}{\partial a^{[l]}}=W^{[l+1]T}·dz^{[l+1]} 激活函数:da[l]=∂a[l]∂L=∂z[l+1]∂L⋅∂a[l]∂z[l+1]=W[l+1]T⋅dz[l+1]
d z [ l ] = ∂ L ∂ z [ l ] = ∂ L ∂ a [ l ] ⋅ ∂ a [ l ] ∂ z [ l ] = d a [ l ] ⋅ g [ l ] ’ ( z [ l ] ) ⇒ d z [ l ] = W [ l + 1 ] T d z [ l + 1 ] ⋅ g [ l ] ’ ( z [ l ] ) dz^{[l]}=\frac{\partial L}{\partial z^{[l]}}=\frac{\partial L}{\partial a^{[l]}}·\frac{\partial a^{[l]}}{\partial z^{[l]}}=da^{[l]}·g^{[l]’}(z^{[l]}) \\ \Rightarrow dz^{[l]}=W^{[l+1]T}dz^{[l+1]}·g^{[l]’}(z^{[l]}) dz[l]=∂z[l]∂L=∂a[l]∂L⋅∂z[l]∂a[l]=da[l]⋅g[l]’(z[l])⇒dz[l]=W[l+1]Tdz[l+1]⋅g[l]’(z[l])
d W [ l ] = ∂ L ∂ W [ l ] = ∂ L ∂ z [ l ] ⋅ ∂ z [ l ] ∂ W [ l ] = d z [ l ] ⋅ a [ l − 1 ] T ⇒ W [ l ] − = α ⋅ d W [ l ] dW^{[l]}=\frac{\partial L}{\partial W^{[l]}}=\frac{\partial L}{\partial z^{[l]}}·\frac{\partial z^{[l]}}{\partial W^{[l]}}=dz^{[l]}·a^{[l-1]T} \\ \Rightarrow W^{[l]}-=α·dW^{[l]} dW[l]=∂W[l]∂L=∂z[l]∂L⋅∂W[l]∂z[l]=dz[l]⋅a[l−1]T⇒W[l]−=α⋅dW[l]
d b [ l ] = ∂ L ∂ b [ l ] = ∂ L ∂ z [ l ] ⋅ ∂ z [ l ] ∂ b [ l ] = d z [ l ] ⇒ b [ l ] − = α ⋅ d b [ l ] db^{[l]}=\frac{\partial L}{\partial b^{[l]}}=\frac{\partial L}{\partial z^{[l]}}·\frac{\partial z^{[l]}}{\partial b^{[l]}}=dz^{[l]} \\ \Rightarrow b^{[l]}-=α·db^{[l]} db[l]=∂b[l]∂L=∂z[l]∂L⋅∂b[l]∂z[l]=dz[l]⇒b[l]−=α⋅db[l]
边栏推荐
猜你喜欢
随机推荐
银河麒麟服务器v10 sp2安装oracle19c
ES6-数组
Oracle入门 11 - Linux 开关机及系统进程命令
10.0 堆体系结构概述之元空间/永久代
TCP/IP协议和互联网协议群
Debian 10 配置网卡,DNS,IP地址
成员内部类使用方式(工作)
数据库/表的基本操作
Debian 10 dhcp 服务配置
@ConfigurationProperties和@EnableConfigurationProperties
Oracle 11g R2 与 Linux 版本的选择
Shell编程规范与变量
Install and use uView
数据库概论 - MySQL的简单介绍
基本正则表达式元字符,字符,次数,锚定分组
2022年软件测试现状最新报告
记录一下,今天开始刷剑指offer
【云原生】-Docker容器迁移Oracle到MySQL
TypeScript基本类型
三本毕业,中途转行软件测试,顶着这些光环从月薪7k干到20k+,感觉还不错








