当前位置:网站首页>Deep learning: implementation skills of deep neural network
Deep learning: implementation skills of deep neural network
2022-06-30 03:12:00 【ShadyPi】
List of articles
normalization
Follow Feature scaling It's like , We have used it many times in the previous machine learning courses . Its main function is to transform the range of eigenvalues into a high-dimensional sphere with the origin as the center , Find the mean vector of each sample μ \mu μ And standard deviation vector σ \sigma σ, namely
μ = 1 m ∑ i = 1 m x ( i ) σ 2 = 1 m ∑ i = 1 m ( x ( i ) − μ ) 2 \mu=\frac{1}{m}\sum_{i=1}^mx^{(i)}\\ \sigma^2=\frac{1}{m}\sum_{i=1}^m(x^{(i)}-\mu)^2 μ=m1i=1∑mx(i)σ2=m1i=1∑m(x(i)−μ)2
after , Make x : = x − μ σ x:=\frac{x-\mu}{\sigma} x:=σx−μ The normalization is completed .
Weight initialization
In deep neural networks , Sometimes there is a gradient explosion / Problems disappear , This is because in deep networks , When propagating from one segment to the other, many weights will be accumulated , Even if the weight matrix is only a little larger or smaller than the identity matrix , After the multiplication, it will still increase exponentially and become a large or small value , The same goes for gradients , This is the gradient explosion / Problems disappear .
Reasonable initialization can effectively alleviate this problem , We can see that there is positive propagation in the process of propagation
A [ l ] = σ ( Z [ l ] ) = σ ( W [ l ] A [ l − 1 ] + b [ l ] ) A^{[l]}=\sigma(Z^{[l]}) =\sigma(W^{[l]}A^{[l-1]}+b^{[l]}) A[l]=σ(Z[l])=σ(W[l]A[l−1]+b[l])
You can see , Number of nodes passed to this layer ( That is, the number of nodes in the upper layer n [ l − 1 ] n^{[l-1]} n[l−1]) The more , Got Z [ l ] Z^{[l]} Z[l] The more likely the value is to be larger , On the contrary, it is more likely to be smaller , So we initialize the weight value to mean 0, The variance of C n [ l − 1 ] \frac{C}{n^{[l-1]}} n[l−1]C Is a normal distribution , So that the size of the calculated value is as moderate as possible , Where the constant C C C In the use of ReLU When the function is used as the excitation function, it usually takes 2, Use logical functions or tanh \tanh tanh Function generally takes 1.
Gradient inspection
Follow Machine learning It is one thing , But there is a new measure , For the gradient vector calculated by back propagation d θ d\theta dθ And using the derivative to define the approximately calculated gradient vector d θ approx d\theta_\text{approx} dθapprox, We calculated
∣ ∣ d θ approx − d θ ∣ ∣ 2 ∣ ∣ d θ approx ∣ ∣ 2 + ∣ ∣ d θ ∣ ∣ 2 \frac{||d\theta_\text{approx}-d\theta||_2}{||d\theta_\text{approx}||_2+||d\theta||_2} ∣∣dθapprox∣∣2+∣∣dθ∣∣2∣∣dθapprox−dθ∣∣2 among ∣ ∣ x ⃗ ∣ ∣ 2 ||\vec{x}||_2 ∣∣x∣∣2 Measure for Euclid .
When ε = 1 0 − 7 \varepsilon=10^{-7} ε=10−7 when , If the above value is less than 1 0 − 7 10^{-7} 10−7, That our calculation is correct . If the value is in the order of 1 0 − 5 10^{-5} 10−5, Need to be checked carefully . If in 1 0 − 3 10^{-3} 10−3, It can be considered that the algorithm has been written .
边栏推荐
- 如何实现远程协同办公,收好这份攻略!
- [untitled]
- How to use redis to realize the like function
- hudi记录
- Use of custom MVC
- The rigorous judgment of ID number is accurate to the last place in the team
- How to modify and add fields when MySQL table data is large
- How to switch ipykernel to a different CONDA virtual environment in jupyterlab?
- Distributed file system fastdfs
- Which is a good foreign exchange trading platform? Is it safe to have regulated funds?
猜你喜欢
Distributed file storage system fastdfs hands on how to do it
OP diode limit swing
什么是外链和内链?
【微信小程序】条件渲染 列表渲染 原来这样用?
GTK interface programming (I): Environment Construction
数据库的下一个变革方向——云原生数据库
Principle, advantages and disadvantages of three operating modes of dc/dc converter under light load
X书6.89版本shield-unidbg调用方式
Golang BiliBili live broadcast bullet screen
Cross domain, CORS, jsonp
随机推荐
备忘一下es6的export/import和类继承的用法
Which is a good foreign exchange trading platform? Is it safe to have regulated funds?
Neo4j---性能优化
【十分钟】manim安装 2022
Global and Chinese market of medical mass notification system 2022-2028: Research Report on technology, participants, trends, market size and share
Mysqldump principle
华为面试题: 高矮个子排队
How to realize remote collaborative office, keep this strategy!
行政路线编码 字母+数字的排序方式
Huawei interview question: divide candy
Uniapp address translation latitude and longitude
Mathematical solution of Joseph Ring
Servlet面试题
Personal PC installation software
Software testing skills, JMeter stress testing tutorial, transaction controller of logic controller (25)
自定义MVC的使用
一篇文章带你入门vim
How to set password complexity and timeout exit function in Oracle
Auto. JS learning notes 15:ui interface basics of autojs Chapter 2
What is the metauniverse: where are we, where are we going