当前位置:网站首页>Deep Learning Theory - Initialization, Parameter Adjustment
Deep Learning Theory - Initialization, Parameter Adjustment
2022-08-04 06:18:00 【Learning adventure】
Initialization
The essence of the deep learning model training process is to update the parameter w, which requires each parameter to have a corresponding initial value.
Why initialization?
Neural network needs to optimize a very complex nonlinear model, and there is basically no global optimal solution, initialization plays a very important role in it.
□ The selection of the initial point can sometimes determine whether the algorithm converges;
□ When it converges, the initial point can determine how fast the learning converges and whether it converges to a point with high or low cost;
□ OverA large initialization leads to exploding gradients, and an initialization that is too small leads to vanishing gradients.
What is a good initialization?
A good initialization should meet the following two conditions:
□ The activation value of each layer of neurons will not be saturated;
□ The activation value of each layer should not be0.
All-zero initialization: The parameter is initialized to 0.
Disadvantages: Neurons in the same layer will learn the same features, and the symmetry properties of different neurons cannot be destroyed.
If the weight of the neuron is initialized to 0, the output of all neurons will be the same, except for the output, all the nodes in the middle layer will have the value of zero.Generally, the neural network has a symmetrical structure, so when the first error backpropagation is performed, the updated network parameters will be the same. In the next update, the same network parameters cannot learn to extract useful features, so the deep learning modelNeither will initialize all parameters with 0.
Parameter adjustment
Batch batchsize choose 2 exponential times with computerMemory match
Hyperparameter tuning method
Trial and error, web search, random search, Bayesian optimization, Gaussian process
边栏推荐
- Halcon缺陷检测
- 光条中心提取方法总结(一)
- 【深度学习21天学习挑战赛】备忘篇:我们的神经网模型到底长啥样?——model.summary()详解
- The use of the attribute of the use of the animation and ButterKnife
- JPEG2jpg
- 【CV-Learning】Convolutional Neural Network
- Logistic Regression --- Introduction, API Introduction, Case: Cancer Classification Prediction, Classification Evaluation, and ROC Curve and AUC Metrics
- PCL1.12 解决memory.h中EIGEN处中断问题
- 深度确定性策略梯度(DDPG)
- PP-LiteSeg
猜你喜欢
浅谈游戏音效测试点
中国联通、欧莱雅和钉钉都在争相打造的秘密武器?虚拟IP未来还有怎样的可能
Pytorch问题总结
动手学深度学习__张量
Image-Adaptive YOLO for Object Detection in Adverse Weather Conditions
BatchNorm&&LayerNorm
双向LSTM
度量学习(Metric learning、损失函数、triplet、三元组损失、fastreid)
TensorFlow2 study notes: 5. Common activation functions
[CV-Learning] Linear Classifier (SVM Basics)
随机推荐
TensorRT 5 初步认识
YOLOV4流程图(方便理解)
Learning curve learning_curve function in sklearn
Comparison of oracle's number and postgresql's numeric
The pipeline mechanism in sklearn
动手学深度学习_线性回归
Briefly say Q-Q map; stats.probplot (QQ map)
【深度学习21天学习挑战赛】3、使用自制数据集——卷积神经网络(CNN)天气识别
简单明了,数据库设计三大范式
语音驱动嘴型与面部动画生成的现状和趋势
TypeError: load() missing 1 required positional argument: ‘Loader‘
Jupyter Notebook installed library;ModuleNotFoundError: No module named 'plotly' solution.
关于DG(域泛化)领域的PCL方法的代码实例
强化学习中,Q-Learning与Sarsa的差别有多大?
(导航页)OpenStack-M版-双节点手工搭建-附B站视频
动手学深度学习__数据操作
WARNING: sql version 9.2, server version 11.0. Some psql features might not work.
yoloV5 使用——训练速度慢,加速训练
双向LSTM
[Go language entry notes] 13. Structure (struct)