当前位置：网站首页>Activation function - relu vs sigmoid

Activation function - relu vs sigmoid

2022-07-02 20:22:00 【Zi Yan Ruoshui】

Data flow through sigmoid after , There will be significant attenuation .

Hypothetical front face w Make a big change $\Delta w$ , after sigmoid Then it will become a small change . This change has been transmitted back attenuation , Until $\Delta l$ . At this time, you will find the front layer $\partial l/\partial w$ Obviously smaller than the following $\partial l/\partial w$ .

If you use the gradient descent method , The latter parameters must iterate faster than the previous parameters , So convergence is faster . As a result, the training of the following parameters is almost completed , The previous parameters are still close to the bad training results of random numbers .

therefore ML Search for alternatives sigmoid The activation function of , Such as relu.

relu Function in Greater than 0 Part of The gradient is constant ,relu Function in Less than 0 At the time of the Derivative is 0 , So once the neuron activation value enters the negative half region , Then the gradient will be 0, In other words, this neuron will not undergo training . Only the neuron activation value enters the positive half area , There will be a gradient value , At this point, the neuron will do this once （ To strengthen ） Training .

relu The nature of the function is very similar to the activation of neurons in Biology .