当前位置:网站首页>Li Hongyi machine learning team learning punch in activity day05 --- skills of network design
Li Hongyi machine learning team learning punch in activity day05 --- skills of network design
2022-07-27 05:27:00 【Charleslc's blog】
Write it at the front
Signed up for a team to study , This time I will learn the skills of network design , The corresponding teacher is Li Hongyi Deep learning The video P5-p9.
Reference video :https://www.bilibili.com/video/av59538266
Reference notes :https://github.com/datawhalechina/leeml-notes
Local minimum and saddle point
When the gradient drops , Optimization sometimes fails , That is, there is a point with a gradient of zero , But the point where the gradient is zero , Not only corresponding local minima( Local minimum ), It may also correspond to the saddle point (saddle point)
So how to judge saddle point still local minimal?
Use Taylor formula to judge , If it is Critical point( The first derivative is zero ), Then the second term is zero , Then we only need to judge the second-order differential .

If Hessian matrix H All eigenvalues of are greater than zero , So it's going to be Local minima; If the eigenvalues are all less than zero , So it's going to be Local maxima; If the characteristic value is greater than zero , And there are less than zero , So this point is Saddle point
If the point is stuck saddle point, Then go along the eigenvector with negative eigenvalue .
for example : This matrix
, An eigenvalue is calculated as -2, The eigenvector is [1;1], Just walk along the direction of the eigenvector to solve .
But this calculation is too much , It won't be used in practice .
batch( batch ) and momentum( momentum )
batch
We can use Batch To optimize , Samples can be divided , Perform gradient descent for each small sample .
Small Batch v.s. Large Batch
utilize GPU Parallel calculation of , It can be seen that once batch size by 100 and batch size by 1 The computing time is about the same

In general ,Smaller batch It takes more time .
Small batch It will have better operation effect .

summary 
Moment
Moment Usually it will maintain the last gradient downward trend , That is, the last gradient decline trend will have an impact on this gradient decline .

Automatically adjust the learning rate

RMSProp
RMSProp Iterative way :

Adam
Adam:RMSProp + Momentum
Suammary

Loss It also has an impact
Cross-entropy Than Mean Square Error More often used for classification .
Example :
Batch standardization (Batch Normalization)
Feature Normalization
Feature Normalization The role of is Give Way x i x_i xi The range setting of is the same 
Considering Deep Learning
There are many layers in deep learning , The output from one layer is the input from the next , If it's on the upper layer input To deal with , However, in the next layer, the operation results in a large difference between the data , similarly , It should also be handled .
however , When change z i z^i zi When , It will affect μ \mu μ and σ \sigma σ, z ( i + 1 ) z^{(i+1)} z(i+1) The value of will also change , Every calculation changes , As a result, there are many intermediate results in the whole network , So we should consider batch processing , That is to say Batch Normalization

After obtaining , then γ \gamma γ Multiply z ~ i \tilde{z}^i z~i Plus β \beta β. and β \beta β and γ \gamma γ It was trained alone , In order to prevent z ~ i \tilde{z}^i z~i The average is 0, It will have a negative impact on Neural Networks .
边栏推荐
- JVM上篇:内存与垃圾回收篇--运行时数据区四-程序计数器
- JVM上篇:内存与垃圾回收篇六--运行时数据区-本地方法&本地方法栈
- Flask的传参以及返回的响应
- Solution to Dlib installation failure
- JVM Part 1: memory and garbage collection part 8 - runtime data area - Method area
- B1026 program running time
- redis发布订阅模式
- JVM Part 1: memory and garbage collection part 3 - runtime data area - overview and threads
- JVM上篇:内存与垃圾回收篇二--类加载子系统
- The interface can automatically generate E and other asynchronous access or restart,
猜你喜欢

Simplify the mybits framework of JDBC

ERROR! MySQL is not running, but PID file exists

实用小工具: Kotlin 代码片段

Flask的使用

SSM framework integration

B1021 个位数统计

Notes Series docker installation PostgreSQL 14

Derivation and explanation of PBR physical illumination calculation formula

JVM Part 1: memory and garbage collection part 9 - runtime data area - object instantiation, memory layout and access location

JVM上篇:内存与垃圾回收篇八--运行时数据区-方法区
随机推荐
2021 OWASP top 6-10 collection
稀疏数组→五子棋的存盘续盘等操作
B1031 check ID card
JVM Part 1: memory and garbage collection -- runtime data area 4 - program counter
JDBC API 详解
接收方设置并发量和限流
JVM上篇:内存与垃圾回收篇--运行时数据区四-程序计数器
JVM Part 1: memory and garbage collection part 12 -- stringtable
Differences and examples between internal classes and static internal classes
JVM上篇:内存与垃圾回收篇六--运行时数据区-本地方法&本地方法栈
35. Scroll
事务,订单系统添加事务
Simplify the mybits framework of JDBC
B1030 perfect sequence
2022 Zhengzhou light industry Freshmen's competition topic - I won't say if I'm killed
B1027 print hourglass
B1031 查验身份证
B1029 old keyboard
李宏毅机器学习组队学习打卡活动day01---机器学习介绍
JVM上篇:内存与垃圾回收篇二--类加载子系统