当前位置:网站首页>Some derivation formulas for machine learning backpropagation
Some derivation formulas for machine learning backpropagation
2022-07-31 07:03:00 【im34v】
1.预备知识
The understanding of matrix derivation can refer to the derivative that we are familiar with in high school,In high school we were all taking scalar derivatives,Scalar can also be regarded as a special kind1*1的矩阵.This article is mainly to document the process of backpropagation in machine learning,So don't do too much analysis on matrix derivation(In fact, neither will I,只会简单的).
Only a matrix derivation situation that needs to be used in the back propagation process is given here:
∂ ( a T x ) ∂ x = ∂ ( a 1 x 1 + a 2 x 2 + ⋯ + a n x n ) ∂ x = [ ∂ ( a 1 x 1 + a 2 x 2 + ⋯ + a n x n ) ∂ x 1 ∂ ( a 1 x 1 + a 2 x 2 + ⋯ + a n x n ) ∂ x 2 ⋮ ∂ ( a 1 x 1 + a 2 x 2 + ⋯ + a n x n ) ∂ x n ] = [ a 1 a 2 ⋮ a n ] = a \begin{aligned} \frac{\partial\left(\boldsymbol{a}^{T} \boldsymbol{x}\right)}{\partial \boldsymbol{x}} &=\frac{\partial\left(a_{1} x_{1}+a_{2} x_{2}+\cdots+a_{n} x_{n}\right)}{\partial \boldsymbol{x}} \\ &=\left[\begin{array}{l} \frac{\partial\left(a_{1} x_{1}+a_{2} x_{2}+\cdots+a_{n} x_{n}\right)}{\partial x_{1}} \\ \frac{\partial\left(a_{1} x_{1}+a_{2} x_{2}+\cdots+a_{n} x_{n}\right)}{\partial x_{2}} \\ \vdots \\ \frac{\partial\left(a_{1} x_{1}+a_{2} x_{2}+\cdots+a_{n} x_{n}\right)}{\partial x_{n}} \end{array}\right] \\ &=\left[\begin{array}{l} a_{1} \\ a_{2} \\ \vdots \\ a_{n} \end{array}\right] \\ &=\boldsymbol a \end{aligned} ∂x∂(aTx)=∂x∂(a1x1+a2x2+⋯+anxn)=⎣⎡∂x1∂(a1x1+a2x2+⋯+anxn)∂x2∂(a1x1+a2x2+⋯+anxn)⋮∂xn∂(a1x1+a2x2+⋯+anxn)⎦⎤=⎣⎡a1a2⋮an⎦⎤=a
Once we understand this, we can start~
2.反向传播

We start propagating backwards:
隐藏层第2层:
激活函数 : d a [ 2 ] = ∂ L ∂ a [ 2 ] 激活函数: da^{[2]}=\frac{\partial L}{\partial a^{[2]}} 激活函数:da[2]=∂a[2]∂L
d z [ 2 ] = ∂ L ∂ z [ 2 ] = ∂ L ∂ a [ 2 ] ⋅ ∂ a [ 2 ] ∂ z [ 2 ] = d a [ 2 ] ⋅ g [ 2 ] ’ ( z [ 2 ] ) dz^{[2]}=\frac{\partial L}{\partial z^{[2]}}=\frac{\partial L}{\partial a^{[2]}}·\frac{\partial a^{[2]}}{\partial z^{[2]}}=da^{[2]}·g^{[2]’}(z^{[2]}) dz[2]=∂z[2]∂L=∂a[2]∂L⋅∂z[2]∂a[2]=da[2]⋅g[2]’(z[2])
d W [ 2 ] = ∂ L ∂ W [ 2 ] = ∂ L ∂ z [ 2 ] ⋅ ∂ z [ 2 ] ∂ W [ 2 ] = d z [ 2 ] ⋅ a [ 1 ] T ⇒ W [ 2 ] − = α ⋅ d W [ 2 ] dW^{[2]}=\frac{\partial L}{\partial W^{[2]}}=\frac{\partial L}{\partial z^{[2]}}·\frac{\partial z^{[2]}}{\partial W^{[2]}}=dz^{[2]}·a^{[1]T} \\ \Rightarrow W^{[2]}-=α·dW^{[2]} dW[2]=∂W[2]∂L=∂z[2]∂L⋅∂W[2]∂z[2]=dz[2]⋅a[1]T⇒W[2]−=α⋅dW[2]
d b [ 2 ] = ∂ L ∂ b [ 2 ] = ∂ L ∂ z [ 2 ] ⋅ ∂ z [ 2 ] ∂ b [ 2 ] = d z [ 2 ] ⇒ b [ 2 ] − = α ⋅ d b [ 2 ] db^{[2]}=\frac{\partial L}{\partial b^{[2]}}=\frac{\partial L}{\partial z^{[2]}}·\frac{\partial z^{[2]}}{\partial b^{[2]}}=dz^{[2]} \\ \Rightarrow b^{[2]}-=α·db^{[2]} db[2]=∂b[2]∂L=∂z[2]∂L⋅∂b[2]∂z[2]=dz[2]⇒b[2]−=α⋅db[2]
隐藏层第1层:
激活函数 : d a [ 1 ] = ∂ L ∂ a [ 1 ] = ∂ L ∂ z [ 2 ] ⋅ ∂ z [ 2 ] ∂ a [ 1 ] = W [ 2 ] T ⋅ d z [ 2 ] 激活函数: da^{[1]}=\frac{\partial L}{\partial a^{[1]}}=\frac{\partial L}{\partial z^{[2]}}·\frac{\partial z^{[2]}}{\partial a^{[1]}}=W^{[2]T}·dz^{[2]} 激活函数:da[1]=∂a[1]∂L=∂z[2]∂L⋅∂a[1]∂z[2]=W[2]T⋅dz[2]
说实话,I don't understand the calculation result of this step:
∂ L ∂ z [ 2 ] \frac{\partial L}{\partial z^{[2]}} ∂z[2]∂L 是 d z [ 2 ] dz^{[2]} dz[2], ∂ z [ 2 ] ∂ a [ 1 ] \frac{\partial z^{[2]}}{\partial a^{[1]}} ∂a[1]∂z[2] 是 W [ 2 ] T W^{[2]T} W[2]T,Why is the result of multiplying W [ 2 ] T ⋅ d z [ 2 ] W^{[2]T}·dz^{[2]} W[2]T⋅dz[2],而不是 d z [ 2 ] ⋅ W [ 2 ] T dz^{[2]}·W^{[2]T} dz[2]⋅W[2]T.
d z [ 1 ] = ∂ L ∂ z [ 1 ] = ∂ L ∂ a [ 1 ] ⋅ ∂ a [ 1 ] ∂ z [ 1 ] = d a [ 1 ] ⋅ g [ 1 ] ’ ( z [ 1 ] ) dz^{[1]}=\frac{\partial L}{\partial z^{[1]}}=\frac{\partial L}{\partial a^{[1]}}·\frac{\partial a^{[1]}}{\partial z^{[1]}}=da^{[1]}·g^{[1]’}(z^{[1]}) dz[1]=∂z[1]∂L=∂a[1]∂L⋅∂z[1]∂a[1]=da[1]⋅g[1]’(z[1])
d W [ 1 ] = ∂ L ∂ W [ 1 ] = ∂ L ∂ z [ 1 ] ⋅ ∂ z [ 1 ] ∂ W [ 1 ] = d z [ 1 ] ⋅ a [ 0 ] T ⇒ W [ 1 ] − = α ⋅ d W [ 1 ] dW^{[1]}=\frac{\partial L}{\partial W^{[1]}}=\frac{\partial L}{\partial z^{[1]}}·\frac{\partial z^{[1]}}{\partial W^{[1]}}=dz^{[1]}·a^{[0]T} \\ \Rightarrow W^{[1]}-=α·dW^{[1]} dW[1]=∂W[1]∂L=∂z[1]∂L⋅∂W[1]∂z[1]=dz[1]⋅a[0]T⇒W[1]−=α⋅dW[1]
d b [ 1 ] = ∂ L ∂ b [ 1 ] = ∂ L ∂ z [ 1 ] ⋅ ∂ z [ 1 ] ∂ b [ 1 ] = d z [ 1 ] ⇒ b [ 1 ] − = α ⋅ d b [ 1 ] db^{[1]}=\frac{\partial L}{\partial b^{[1]}}=\frac{\partial L}{\partial z^{[1]}}·\frac{\partial z^{[1]}}{\partial b^{[1]}}=dz^{[1]} \\ \Rightarrow b^{[1]}-=α·db^{[1]} db[1]=∂b[1]∂L=∂z[1]∂L⋅∂b[1]∂z[1]=dz[1]⇒b[1]−=α⋅db[1]
3.总结
第l层:
激活函数 : d a [ l ] = ∂ L ∂ a [ l ] = ∂ L ∂ z [ l + 1 ] ⋅ ∂ z [ l + 1 ] ∂ a [ l ] = W [ l + 1 ] T ⋅ d z [ l + 1 ] 激活函数: da^{[l]}=\frac{\partial L}{\partial a^{[l]}}=\frac{\partial L}{\partial z^{[l+1]}}·\frac{\partial z^{[l+1]}}{\partial a^{[l]}}=W^{[l+1]T}·dz^{[l+1]} 激活函数:da[l]=∂a[l]∂L=∂z[l+1]∂L⋅∂a[l]∂z[l+1]=W[l+1]T⋅dz[l+1]
d z [ l ] = ∂ L ∂ z [ l ] = ∂ L ∂ a [ l ] ⋅ ∂ a [ l ] ∂ z [ l ] = d a [ l ] ⋅ g [ l ] ’ ( z [ l ] ) ⇒ d z [ l ] = W [ l + 1 ] T d z [ l + 1 ] ⋅ g [ l ] ’ ( z [ l ] ) dz^{[l]}=\frac{\partial L}{\partial z^{[l]}}=\frac{\partial L}{\partial a^{[l]}}·\frac{\partial a^{[l]}}{\partial z^{[l]}}=da^{[l]}·g^{[l]’}(z^{[l]}) \\ \Rightarrow dz^{[l]}=W^{[l+1]T}dz^{[l+1]}·g^{[l]’}(z^{[l]}) dz[l]=∂z[l]∂L=∂a[l]∂L⋅∂z[l]∂a[l]=da[l]⋅g[l]’(z[l])⇒dz[l]=W[l+1]Tdz[l+1]⋅g[l]’(z[l])
d W [ l ] = ∂ L ∂ W [ l ] = ∂ L ∂ z [ l ] ⋅ ∂ z [ l ] ∂ W [ l ] = d z [ l ] ⋅ a [ l − 1 ] T ⇒ W [ l ] − = α ⋅ d W [ l ] dW^{[l]}=\frac{\partial L}{\partial W^{[l]}}=\frac{\partial L}{\partial z^{[l]}}·\frac{\partial z^{[l]}}{\partial W^{[l]}}=dz^{[l]}·a^{[l-1]T} \\ \Rightarrow W^{[l]}-=α·dW^{[l]} dW[l]=∂W[l]∂L=∂z[l]∂L⋅∂W[l]∂z[l]=dz[l]⋅a[l−1]T⇒W[l]−=α⋅dW[l]
d b [ l ] = ∂ L ∂ b [ l ] = ∂ L ∂ z [ l ] ⋅ ∂ z [ l ] ∂ b [ l ] = d z [ l ] ⇒ b [ l ] − = α ⋅ d b [ l ] db^{[l]}=\frac{\partial L}{\partial b^{[l]}}=\frac{\partial L}{\partial z^{[l]}}·\frac{\partial z^{[l]}}{\partial b^{[l]}}=dz^{[l]} \\ \Rightarrow b^{[l]}-=α·db^{[l]} db[l]=∂b[l]∂L=∂z[l]∂L⋅∂b[l]∂z[l]=dz[l]⇒b[l]−=α⋅db[l]
边栏推荐
猜你喜欢
随机推荐
12.0 堆参数调优入门之GC收集日志信息
银河麒麟v10 sp1 安装 PostgreSQL 11.16
frp内网穿透服务
ES6-02-let和const关键字
【博学谷学习记录】超强总结,用心分享 | 软件测试 抓包
VNC 启动脚本
记录一下,今天开始刷剑指offer
选择排序法
Basic usage of Koa framework
TypeScript基本类型
OSI七层模型
In-depth analysis of z-index
项目练习——备忘录(增删改查)
什么是浮动?什么是文档流?清除浮动的几种方式及原理?什么是BFC,如何触发BFC,BFC的作用
npm install出现node错误
FRP穿透教程
TypeScript进阶
读写文件,异常,模块和包
4-1-7 二叉树及其遍历 家谱处理 (30 分)
定义一个类,super的使用,私有属性









