当前位置:网站首页>Deep learning: implementation skills of deep neural network
Deep learning: implementation skills of deep neural network
2022-06-30 03:12:00 【ShadyPi】
List of articles
normalization
Follow Feature scaling It's like , We have used it many times in the previous machine learning courses . Its main function is to transform the range of eigenvalues into a high-dimensional sphere with the origin as the center , Find the mean vector of each sample μ \mu μ And standard deviation vector σ \sigma σ, namely
μ = 1 m ∑ i = 1 m x ( i ) σ 2 = 1 m ∑ i = 1 m ( x ( i ) − μ ) 2 \mu=\frac{1}{m}\sum_{i=1}^mx^{(i)}\\ \sigma^2=\frac{1}{m}\sum_{i=1}^m(x^{(i)}-\mu)^2 μ=m1i=1∑mx(i)σ2=m1i=1∑m(x(i)−μ)2
after , Make x : = x − μ σ x:=\frac{x-\mu}{\sigma} x:=σx−μ The normalization is completed .
Weight initialization
In deep neural networks , Sometimes there is a gradient explosion / Problems disappear , This is because in deep networks , When propagating from one segment to the other, many weights will be accumulated , Even if the weight matrix is only a little larger or smaller than the identity matrix , After the multiplication, it will still increase exponentially and become a large or small value , The same goes for gradients , This is the gradient explosion / Problems disappear .
Reasonable initialization can effectively alleviate this problem , We can see that there is positive propagation in the process of propagation
A [ l ] = σ ( Z [ l ] ) = σ ( W [ l ] A [ l − 1 ] + b [ l ] ) A^{[l]}=\sigma(Z^{[l]}) =\sigma(W^{[l]}A^{[l-1]}+b^{[l]}) A[l]=σ(Z[l])=σ(W[l]A[l−1]+b[l])
You can see , Number of nodes passed to this layer ( That is, the number of nodes in the upper layer n [ l − 1 ] n^{[l-1]} n[l−1]) The more , Got Z [ l ] Z^{[l]} Z[l] The more likely the value is to be larger , On the contrary, it is more likely to be smaller , So we initialize the weight value to mean 0, The variance of C n [ l − 1 ] \frac{C}{n^{[l-1]}} n[l−1]C Is a normal distribution , So that the size of the calculated value is as moderate as possible , Where the constant C C C In the use of ReLU When the function is used as the excitation function, it usually takes 2, Use logical functions or tanh \tanh tanh Function generally takes 1.
Gradient inspection
Follow Machine learning It is one thing , But there is a new measure , For the gradient vector calculated by back propagation d θ d\theta dθ And using the derivative to define the approximately calculated gradient vector d θ approx d\theta_\text{approx} dθapprox, We calculated
∣ ∣ d θ approx − d θ ∣ ∣ 2 ∣ ∣ d θ approx ∣ ∣ 2 + ∣ ∣ d θ ∣ ∣ 2 \frac{||d\theta_\text{approx}-d\theta||_2}{||d\theta_\text{approx}||_2+||d\theta||_2} ∣∣dθapprox∣∣2+∣∣dθ∣∣2∣∣dθapprox−dθ∣∣2 among ∣ ∣ x ⃗ ∣ ∣ 2 ||\vec{x}||_2 ∣∣x∣∣2 Measure for Euclid .
When ε = 1 0 − 7 \varepsilon=10^{-7} ε=10−7 when , If the above value is less than 1 0 − 7 10^{-7} 10−7, That our calculation is correct . If the value is in the order of 1 0 − 5 10^{-5} 10−5, Need to be checked carefully . If in 1 0 − 3 10^{-3} 10−3, It can be considered that the algorithm has been written .
边栏推荐
- QT中foreach的使用
- Code for generating test and training sets
- HOOK Native API
- Servlet面试题
- Mathematical solution of Joseph Ring
- Gulang bilibilibili Live Screen Jackie
- How to realize remote collaborative office, keep this strategy!
- Some technology sharing
- Regular full match: the password consists of more than 8 digits, upper and lower case letters, and special characters
- Which is a good foreign exchange trading platform? Is it safe to have regulated funds?
猜你喜欢

Uniapp address translation latitude and longitude

O & M (21) make winpe startup USB flash disk

Study diary: February 15, 2022

自定义MVC的使用

Compile a DLL without import table

hudi记录

GTK interface programming (I): Environment Construction

Distributed file system fastdfs

【实战技能】如何撰写敏捷开发文档

Customize the buttons of jvxetable and the usage of $set under notes
随机推荐
The Oracle main program is deleted, but the data is on another hard disk. Can I import the data again?
How to modify and add fields when MySQL table data is large
Intel hex, Motorola S-Record format detailed analysis
快速排序、聚簇索引、寻找数据中第k大的值
Comparable和Comparator的区别
shell统计某个字符串最后一次出现的位置之前的所有字符串
How does native JS generate Jiugong lattice
Federal learning: dividing non IID samples by Dirichlet distribution
X书6.89版本shield-unidbg调用方式
Mysql提取表字段中的字符串
Huawei interview question: tall and short people queue up
怎样的外汇交易平台是有监管的,是安全的?
MySQL extracts strings from table fields
Compile a DLL without import table
Neo4j---性能优化
How do I enable assembly binding logging- How can I enable Assembly binding logging?
Principle of device driver
O & M (21) make winpe startup USB flash disk
How to use redis to realize the like function
zabbix 触发器详解