当前位置：网站首页>Deep Learning Theory - Initialization, Parameter Adjustment

Deep Learning Theory - Initialization, Parameter Adjustment

2022-08-04 06:18:00 【Learning adventure】

Initialization

The essence of the deep learning model training process is to update the parameter w, which requires each parameter to have a corresponding initial value.

Why initialization?
Neural network needs to optimize a very complex nonlinear model, and there is basically no global optimal solution, initialization plays a very important role in it.
□ The selection of the initial point can sometimes determine whether the algorithm converges;
□ When it converges, the initial point can determine how fast the learning converges and whether it converges to a point with high or low cost;
□ OverA large initialization leads to exploding gradients, and an initialization that is too small leads to vanishing gradients.

What is a good initialization?
A good initialization should meet the following two conditions:
□ The activation value of each layer of neurons will not be saturated;
□ The activation value of each layer should not be0.

All-zero initialization: The parameter is initialized to 0.
Disadvantages: Neurons in the same layer will learn the same features, and the symmetry properties of different neurons cannot be destroyed.
If the weight of the neuron is initialized to 0, the output of all neurons will be the same, except for the output, all the nodes in the middle layer will have the value of zero.Generally, the neural network has a symmetrical structure, so when the first error backpropagation is performed, the updated network parameters will be the same. In the next update, the same network parameters cannot learn to extract useful features, so the deep learning modelNeither will initialize all parameters with 0.