当前位置:网站首页>Deep Learning Theory - Initialization, Parameter Adjustment
Deep Learning Theory - Initialization, Parameter Adjustment
2022-08-04 06:18:00 【Learning adventure】
Initialization
The essence of the deep learning model training process is to update the parameter w, which requires each parameter to have a corresponding initial value.
Why initialization?
Neural network needs to optimize a very complex nonlinear model, and there is basically no global optimal solution, initialization plays a very important role in it.
□ The selection of the initial point can sometimes determine whether the algorithm converges;
□ When it converges, the initial point can determine how fast the learning converges and whether it converges to a point with high or low cost;
□ OverA large initialization leads to exploding gradients, and an initialization that is too small leads to vanishing gradients.
What is a good initialization?
A good initialization should meet the following two conditions:
□ The activation value of each layer of neurons will not be saturated;
□ The activation value of each layer should not be0.
All-zero initialization: The parameter is initialized to 0.
Disadvantages: Neurons in the same layer will learn the same features, and the symmetry properties of different neurons cannot be destroyed.
If the weight of the neuron is initialized to 0, the output of all neurons will be the same, except for the output, all the nodes in the middle layer will have the value of zero.Generally, the neural network has a symmetrical structure, so when the first error backpropagation is performed, the updated network parameters will be the same. In the next update, the same network parameters cannot learn to extract useful features, so the deep learning modelNeither will initialize all parameters with 0.
Parameter adjustment
![]()
Batch batchsize choose 2 exponential times with computerMemory match
Hyperparameter tuning method
Trial and error, web search, random search, Bayesian optimization, Gaussian process
边栏推荐
- Data reading in yolov3 (1)
- Logistic Regression --- Introduction, API Introduction, Case: Cancer Classification Prediction, Classification Evaluation, and ROC Curve and AUC Metrics
- 深度学习理论 —— 初始化、参数调节
- Deep Adversarial Decomposition: A Unified Framework for Separating Superimposed Images
- 动手学深度学习_卷积神经网络CNN
- TensorFlow2 study notes: 5. Common activation functions
- 腾讯、网易纷纷出手,火到出圈的元宇宙到底是个啥?
- 【论文阅读】Anchor-Free Person Search
- Briefly say Q-Q map; stats.probplot (QQ map)
- 动手学深度学习_多层感知机
猜你喜欢
MOOSE平台使用入门攻略——如何运行官方教程的例子
Thoroughly understand box plot analysis
图像形变(插值方法)
【论文阅读】Anchor-Free Person Search
Briefly say Q-Q map; stats.probplot (QQ map)
【深度学习日记】第一天:Hello world,Hello CNN MNIST
Linear Regression 02---Boston Housing Price Prediction
【go语言入门笔记】13、 结构体(struct)
Copy攻城狮的年度之“战”|回顾2020
The pipeline mechanism in sklearn
随机推荐
Briefly say Q-Q map; stats.probplot (QQ map)
动手学深度学习_多层感知机
动手学深度学习_线性回归
tensorRT5.15 使用中的注意点
SQL注入详解
【go语言入门笔记】12、指针
Unity ML-agents 参数设置解明
TensorFlow2 study notes: 6. Overfitting and underfitting, and their mitigation solutions
TensorFlow2 study notes: 5. Common activation functions
典型CCN网络——efficientNet(2019-Google-已开源)
MFC 打开与保存点云PCD文件
线性回归简介01---API使用案例
2020-10-19
计算某像素点法线
如何用Pygame制作简单的贪吃蛇游戏
【论文阅读】Further Non-local and Channel Attention Networks for Vehicle Re-identification
【论文阅读】Anchor-Free Person Search
图像resize
MOOSE平台使用入门攻略——如何运行官方教程的例子
tensorRT教程——使用tensorRT OP 搭建自己的网络