当前位置:网站首页>Deep Learning Theory - Initialization, Parameter Adjustment
Deep Learning Theory - Initialization, Parameter Adjustment
2022-08-04 06:18:00 【Learning adventure】
Initialization
The essence of the deep learning model training process is to update the parameter w, which requires each parameter to have a corresponding initial value.
Why initialization?
Neural network needs to optimize a very complex nonlinear model, and there is basically no global optimal solution, initialization plays a very important role in it.
□ The selection of the initial point can sometimes determine whether the algorithm converges;
□ When it converges, the initial point can determine how fast the learning converges and whether it converges to a point with high or low cost;
□ OverA large initialization leads to exploding gradients, and an initialization that is too small leads to vanishing gradients.
What is a good initialization?
A good initialization should meet the following two conditions:
□ The activation value of each layer of neurons will not be saturated;
□ The activation value of each layer should not be0.
All-zero initialization: The parameter is initialized to 0.
Disadvantages: Neurons in the same layer will learn the same features, and the symmetry properties of different neurons cannot be destroyed.
If the weight of the neuron is initialized to 0, the output of all neurons will be the same, except for the output, all the nodes in the middle layer will have the value of zero.Generally, the neural network has a symmetrical structure, so when the first error backpropagation is performed, the updated network parameters will be the same. In the next update, the same network parameters cannot learn to extract useful features, so the deep learning modelNeither will initialize all parameters with 0.
Parameter adjustment
![]()
Batch batchsize choose 2 exponential times with computerMemory match

Hyperparameter tuning method
Trial and error, web search, random search, Bayesian optimization, Gaussian process
边栏推荐
- 【深度学习21天学习挑战赛】备忘篇:我们的神经网模型到底长啥样?——model.summary()详解
- fuser 使用—— YOLOV5内存溢出——kill nvidai-smi 无pid 的 GPU 进程
- yolov3 data reading (2)
- tensorRT教程——tensor RT OP理解(实现自定义层,搭建网络)
- target has libraries with conflicting names: libcrypto.a and libssl.a.
- The difference between oracle temporary table and pg temporary table
- 图像resize
- 图像合并水平拼接
- Th in thymeleaf: href use notes
- [Introduction to go language] 12. Pointer
猜你喜欢

TensorFlow2学习笔记:6、过拟合和欠拟合,及其缓解方案

【CV-Learning】Convolutional Neural Network

基于BiGRU和GAN的数据生成方法

tensorRT教程——tensor RT OP理解(实现自定义层,搭建网络)

【CV-Learning】Object Detection & Instance Segmentation

动手学深度学习_卷积神经网络CNN

【CV-Learning】卷积神经网络预备知识

基于PyTorch的FCN-8s语义分割模型搭建

Polynomial Regression (PolynomialFeatures)

Vision Transformer 论文 + 详解( ViT )
随机推荐
Comparison of oracle's number and postgresql's numeric
光条提取中的连通域筛除
TensorFlow2 study notes: 8. tf.keras implements linear regression, Income dataset: years of education and income dataset
TensorFlow2学习笔记:5、常用激活函数
Copy攻城狮的年度之“战”|回顾2020
【深度学习日记】第一天:Hello world,Hello CNN MNIST
图像形变(插值方法)
【论文阅读】Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation
MNIST手写数字识别 —— Lenet-5首个商用级别卷积神经网络
审稿意见回复
lstm pipeline 过程理解(输入输出)
机器学习——分类问题对于文字标签的处理(特征工程)
TensorFlow2学习笔记:8、tf.keras实现线性回归,Income数据集:受教育年限与收入数据集
强化学习中,Q-Learning与Sarsa的差别有多大?
周志华机器学习
YOLOV5 V6.1 详细训练方法
Briefly say Q-Q map; stats.probplot (QQ map)
2020-10-19
深度学习理论——过拟合、欠拟合、正则化、优化器
Androd Day02




Batch batchsize choose 2 exponential times with computerMemory match





