当前位置:网站首页>Deep learning: implementation skills of deep neural network
Deep learning: implementation skills of deep neural network
2022-06-30 03:12:00 【ShadyPi】
List of articles
normalization
Follow Feature scaling It's like , We have used it many times in the previous machine learning courses . Its main function is to transform the range of eigenvalues into a high-dimensional sphere with the origin as the center , Find the mean vector of each sample μ \mu μ And standard deviation vector σ \sigma σ, namely
μ = 1 m ∑ i = 1 m x ( i ) σ 2 = 1 m ∑ i = 1 m ( x ( i ) − μ ) 2 \mu=\frac{1}{m}\sum_{i=1}^mx^{(i)}\\ \sigma^2=\frac{1}{m}\sum_{i=1}^m(x^{(i)}-\mu)^2 μ=m1i=1∑mx(i)σ2=m1i=1∑m(x(i)−μ)2
after , Make x : = x − μ σ x:=\frac{x-\mu}{\sigma} x:=σx−μ The normalization is completed .
Weight initialization
In deep neural networks , Sometimes there is a gradient explosion / Problems disappear , This is because in deep networks , When propagating from one segment to the other, many weights will be accumulated , Even if the weight matrix is only a little larger or smaller than the identity matrix , After the multiplication, it will still increase exponentially and become a large or small value , The same goes for gradients , This is the gradient explosion / Problems disappear .
Reasonable initialization can effectively alleviate this problem , We can see that there is positive propagation in the process of propagation
A [ l ] = σ ( Z [ l ] ) = σ ( W [ l ] A [ l − 1 ] + b [ l ] ) A^{[l]}=\sigma(Z^{[l]}) =\sigma(W^{[l]}A^{[l-1]}+b^{[l]}) A[l]=σ(Z[l])=σ(W[l]A[l−1]+b[l])
You can see , Number of nodes passed to this layer ( That is, the number of nodes in the upper layer n [ l − 1 ] n^{[l-1]} n[l−1]) The more , Got Z [ l ] Z^{[l]} Z[l] The more likely the value is to be larger , On the contrary, it is more likely to be smaller , So we initialize the weight value to mean 0, The variance of C n [ l − 1 ] \frac{C}{n^{[l-1]}} n[l−1]C Is a normal distribution , So that the size of the calculated value is as moderate as possible , Where the constant C C C In the use of ReLU When the function is used as the excitation function, it usually takes 2, Use logical functions or tanh \tanh tanh Function generally takes 1.
Gradient inspection
Follow Machine learning It is one thing , But there is a new measure , For the gradient vector calculated by back propagation d θ d\theta dθ And using the derivative to define the approximately calculated gradient vector d θ approx d\theta_\text{approx} dθapprox, We calculated
∣ ∣ d θ approx − d θ ∣ ∣ 2 ∣ ∣ d θ approx ∣ ∣ 2 + ∣ ∣ d θ ∣ ∣ 2 \frac{||d\theta_\text{approx}-d\theta||_2}{||d\theta_\text{approx}||_2+||d\theta||_2} ∣∣dθapprox∣∣2+∣∣dθ∣∣2∣∣dθapprox−dθ∣∣2 among ∣ ∣ x ⃗ ∣ ∣ 2 ||\vec{x}||_2 ∣∣x∣∣2 Measure for Euclid .
When ε = 1 0 − 7 \varepsilon=10^{-7} ε=10−7 when , If the above value is less than 1 0 − 7 10^{-7} 10−7, That our calculation is correct . If the value is in the order of 1 0 − 5 10^{-5} 10−5, Need to be checked carefully . If in 1 0 − 3 10^{-3} 10−3, It can be considered that the algorithm has been written .
边栏推荐
- Mysqldump principle
- 快速排序、聚簇索引、寻找数据中第k大的值
- Add a custom button to jvxetable
- JS conversion of letters and numbers
- *Write a program to initialize a string object with a vector < char> container*/
- Auto. JS learning notes 15:ui interface basics of autojs Chapter 2
- How does native JS generate Jiugong lattice
- MySQL extracts strings from table fields
- Use of custom MVC
- Golang BiliBili live broadcast bullet screen
猜你喜欢

Distributed file storage system fastdfs hands on how to do it

How to use redis to realize the like function

Auto.js学习笔记16:按项目保存到手机上,不用每次都保存单个js文件,方便调试和打包

Study diary: February 15, 2022

General paging (2)

编译一个无导入表的DLL

简单自定义MVC优化

Use of Arthas

Prompt learning a blood case caused by a space
![[wechat applet] how did the conditional rendering list render work?](/img/db/4e79279272b75759cdc8d6f31950f1.png)
[wechat applet] how did the conditional rendering list render work?
随机推荐
WPF initialized event in The reason why binding is not triggered in CS
Prompt learning a blood case caused by a space
GTK interface programming (I): Environment Construction
JS cross reference
The rigorous judgment of ID number is accurate to the last place in the team
Study diary: February 15, 2022
Summary of interview and Employment Questions
Distributed file storage system fastdfs hands on how to do it
Functions in C language
【实战技能】如何撰写敏捷开发文档
How to set password complexity and timeout exit function in Oracle
中断操作:AbortController学习笔记
IDEA 远程调试 Remote JVM Debug
Global and Chinese markets for active transdermal drug delivery devices 2022-2028: Research Report on technology, participants, trends, market size and share
Principle of device driver
JS 字母和数字的相互转换
On the role of database tables
golang bilibili直播彈幕姬
Hands on in-depth learning notes (XV) 4.1 Multilayer perceptron
Mathematical solution of Joseph Ring