当前位置:网站首页>Li Hongyi machine learning team learning punch in activity day05 --- skills of network design
Li Hongyi machine learning team learning punch in activity day05 --- skills of network design
2022-07-27 05:27:00 【Charleslc's blog】
Write it at the front
Signed up for a team to study , This time I will learn the skills of network design , The corresponding teacher is Li Hongyi Deep learning The video P5-p9.
Reference video :https://www.bilibili.com/video/av59538266
Reference notes :https://github.com/datawhalechina/leeml-notes
Local minimum and saddle point
When the gradient drops , Optimization sometimes fails , That is, there is a point with a gradient of zero , But the point where the gradient is zero , Not only corresponding local minima( Local minimum ), It may also correspond to the saddle point (saddle point)
So how to judge saddle point still local minimal?
Use Taylor formula to judge , If it is Critical point( The first derivative is zero ), Then the second term is zero , Then we only need to judge the second-order differential .

If Hessian matrix H All eigenvalues of are greater than zero , So it's going to be Local minima; If the eigenvalues are all less than zero , So it's going to be Local maxima; If the characteristic value is greater than zero , And there are less than zero , So this point is Saddle point
If the point is stuck saddle point, Then go along the eigenvector with negative eigenvalue .
for example : This matrix
, An eigenvalue is calculated as -2, The eigenvector is [1;1], Just walk along the direction of the eigenvector to solve .
But this calculation is too much , It won't be used in practice .
batch( batch ) and momentum( momentum )
batch
We can use Batch To optimize , Samples can be divided , Perform gradient descent for each small sample .
Small Batch v.s. Large Batch
utilize GPU Parallel calculation of , It can be seen that once batch size by 100 and batch size by 1 The computing time is about the same

In general ,Smaller batch It takes more time .
Small batch It will have better operation effect .

summary 
Moment
Moment Usually it will maintain the last gradient downward trend , That is, the last gradient decline trend will have an impact on this gradient decline .

Automatically adjust the learning rate

RMSProp
RMSProp Iterative way :

Adam
Adam:RMSProp + Momentum
Suammary

Loss It also has an impact
Cross-entropy Than Mean Square Error More often used for classification .
Example :
Batch standardization (Batch Normalization)
Feature Normalization
Feature Normalization The role of is Give Way x i x_i xi The range setting of is the same 
Considering Deep Learning
There are many layers in deep learning , The output from one layer is the input from the next , If it's on the upper layer input To deal with , However, in the next layer, the operation results in a large difference between the data , similarly , It should also be handled .
however , When change z i z^i zi When , It will affect μ \mu μ and σ \sigma σ, z ( i + 1 ) z^{(i+1)} z(i+1) The value of will also change , Every calculation changes , As a result, there are many intermediate results in the whole network , So we should consider batch processing , That is to say Batch Normalization

After obtaining , then γ \gamma γ Multiply z ~ i \tilde{z}^i z~i Plus β \beta β. and β \beta β and γ \gamma γ It was trained alone , In order to prevent z ~ i \tilde{z}^i z~i The average is 0, It will have a negative impact on Neural Networks .
边栏推荐
- Message reliability processing
- redis持久化
- How to store the startprocessinstancebykey method in acticiti in the variable table
- Three waiting methods of selenium and three processing methods of alert pop-up
- Rolling Division
- Card drawing program simulation
- LeetCode之6 ZigZag Conversion
- Li Kou achieved the second largest result
- Notes series k8s orchestration MySQL container - stateful container creation process
- B1027 打印沙漏
猜你喜欢

稀疏数组→五子棋的存盘续盘等操作

Bean's life cycle & dependency injection * dependency auto assembly

Li Hongyi machine learning team learning punch in activity day01 --- introduction to machine learning

SSM framework integration

Utility gadget: kotlin code snippet

Pinball games

Shell course summary

JVM上篇:内存与垃圾回收篇三--运行时数据区-概述及线程

JVM上篇:内存与垃圾回收篇八--运行时数据区-方法区

上传七牛云的方法
随机推荐
LeetCode之268.Missing number
Database design - relational data theory (ultra detailed)
SSM framework integration
Flask的传参以及返回的响应
数据库设计——关系数据理论(超详细)
笔记系列之docker安装Postgresql 14
cookie增删改查和异常
322 coin change of leetcode
268.missing number of leetcode
MQ set expiration time, priority, dead letter queue, delay queue
Raspberry pie RTMP streaming local camera image
辗转相除法
Machine learning overview
Utility gadget: kotlin code snippet
李宏毅机器学习组队学习打卡活动day05---网络设计的技巧
B1021 single digit statistics
B1030 perfect sequence
Enumeration class implements singleton mode
2022年郑州轻工业新生赛题目-打死我也不说
B1025 reverse linked list*******