当前位置:网站首页>In depth learning report (3)
In depth learning report (3)
2022-07-27 01:06:00 【Curve overtaker】
Catalog
Chapter vii. Parameter adjustment
Chapter six initialization
One 、 Why initialize ?
- Selection of initial point , Sometimes it can decide whether the algorithm converges
- When converging , The initial point can determine how fast the learning converges , Whether it can converge to a high or low cost point
- Too large initialization leads to gradient explosion , Too small initialization causes the gradient to disappear
Two 、 What is good initialization ?
- So that the activation value of each neuron layer will not be saturated
- The activation value of each layer cannot be 0
3、 ... and 、 Common initialization
1、 All zero initialization
- All zero initialization : The initial value of the parameter is 0.
- shortcoming : Neurons in the same layer will learn the same characteristics , The symmetry of different neurons cannot be destroyed . If the weights of neurons are initialized to 0 , The output of all neurons will be the same , In addition to output , The value of all nodes in the middle tier is zero . General neural networks have a symmetrical structure , Then when the first error back propagation is carried out , The updated network parameters will be the same , At the next update , Learning the same network parameters can not extract useful features , Therefore, deep learning models will not be used 0 Initialize all parameters .
2、 Random initialization
- Random initialization : Initialize the parameter to a small random number . Generally, the random value is μ , The standard deviation is σ Sampling in Gaussian distribution , Each dimension of the final parameter comes from a multidimensional Gaussian distribution .
- shortcoming : Once the random distribution is chosen improperly , Will lead to network optimization in trouble . If the initial value of the parameter is too small , In back propagation, it will lead to small gradients , For deep Networks , There will be gradient dispersion problem , Reduce the convergence speed of parameters . If the initial value of the parameter is too large , Then neurons will be easily saturated .
3、Xavier initialization

4、He initialization

Chapter vii. Parameter adjustment
1、 Trial and error method : For example, after students design an experiment , Follow all the steps of the learning process ( Visualization from data collection to feature map ), Then iterate on the super parameter in turn until the time ends .
2、 The grid search : If there are three or fewer super parameters , The common super parameter search method is grid search . For each super parameter , The user selects a smaller set of finite values to explore . then , The Cartesian product of these hyperparameters yields a set of hyperparameters , Grid search uses each set of hyperparametric training models . Select the super parameter with the smallest error in the verification set as the best super parameter .
3、 Random search : The only difference between grid search and random search is the first step of the policy cycle , Random search randomly selects points in the configuration space .
4、 Bayesian optimization :
- Build the model
- Select the super parameter
- Training , assessment
- Optimization model , Return to step 2
边栏推荐
- 数据库表连接的简单解释
- 基于Flink实时项目:用户行为分析(三:网站总浏览量统计(PV))
- VMware Workstation 虚拟机启动就直接蓝屏重启问题解决
- Only hard work, hard work and hard work are the only way out C - patient entity class
- Spark On YARN的作业提交流程
- 游戏项目导出AAB包上传谷歌提示超过150M的解决方案
- Write the changed data in MySQL to Kafka through flinkcdc (datastream mode)
- Flink1.11 多并行度watermark测试
- Flink checkpoint源码理解
- MySQL index optimization: scenarios where the index fails and is not suitable for indexing
猜你喜欢

Golang切片make与new的区别

Scala pattern matching

Flink 1.15 local cluster deployment standalone mode (independent cluster mode)

分区的使用及案例
![[HFCTF2020]EasyLogin](/img/23/91912865a01180ee191a513be22c03.png)
[HFCTF2020]EasyLogin

MySQL索引优化:索引失效以及不适合建立索引的场景

logback自定义MessageConverter
![[b01lers2020]Welcome to Earth](/img/e7/c8c0427b95022fbdf7bf2128c469c0.png)
[b01lers2020]Welcome to Earth

flink需求之—SideOutPut(侧输出流的应用:将温度大于30℃的输出到主流,低于30℃的输出到侧流)
![[RootersCTF2019]I_< 3_ Flask](/img/69/1c77e45e939cf86bb75be8a6c42574.png)
[RootersCTF2019]I_< 3_ Flask
随机推荐
2022.7.14DAY604
[watevrCTF-2019]Cookie Store
The difference between golang slice make and new
Only hard work, hard work and hard work are the only way out C - patient entity class
Isolation level of MySQL database transactions (detailed explanation)
2022.7.14DAY605
MySQL Part 2
Scala pattern matching
Android——数据持久化技术(三) 数据库存储
adb shell截屏录屏命令
进入2022年,移动互联网的小程序和短视频直播赛道还有机会吗?
深入理解Pod对象:基本管理
mermaid
Cannot find a valid baseurl for repo: HDP-3.1-repo-1
基于Flink实时计算Demo:用户行为分析(四:在一段时间内到底有多少不同的用户访问了网站(UV))
07 - 日志服务器的搭建与攻击
Spark data skew solution
One of the Flink requirements - processfunction (requirement: alarm if the temperature rises continuously within 30 seconds)
MYSQL数据库事务的隔离级别(详解)
Flink based real-time computing Demo - Data Analysis on user behavior