当前位置:网站首页>2.4 activation function
2.4 activation function
2022-07-01 09:08:00 【Enzo tried to smash the computer】
Catalog
- One 、 Activation function
- Two 、 Common activation functions
- 1. Sigmoid function
- 2. Tanh/ Hyperbolic tangent activation function
- 3. ReLU Activation function
- 4. Leaky ReLU
- 5. Parametric ReLU Activation function
- 6. ELU Activation function
- 7. SeLU Activation function
- 8. Softmax Activation function
- 9. Swish Activation function
- 10. Maxout Activation function
- 11. Softplus Activation function
One 、 Activation function
Activation functions have many functions in neural networks , Its main function is to provide the neural network with Nonlinear modeling ability . If there is no activation function , Then the multilayer neural network can only deal with linear separable problems .
therefore , Neural networks use activation functions to add nonlinear factors , Improve the expression ability of the model
Two 、 Common activation functions
1. Sigmoid function
Sigmoid The function is also called Logistic function , For hidden layer neuron output , The value range is (0,1), It maps a real number to (0,1) The range of , It can be used for secondary classification . The effect is better when the feature difference is more complex or the difference is not particularly large .sigmoid Is a very common activation function , The expression of the function is as follows :
f ( x ) = 1 1 + e − x f_{(x)} =\frac 1 {1+e^{-x}} f(x)=1+e−x1
The image is similar to a S Shape curve 
Under what circumstances is it suitable to use Sigmoid What about the activation function ?
- Sigmoid The output range of the function is 0 To 1. Because the output value is limited to 0 To 1, So it makes a comparison of the output of each neuron normalization ;
- Is used to Forecast probability as output Model of . Because the range of probability is 0 To 1, therefore Sigmoid The function fits perfectly ;
- Gradient smoothing , avoid 「 jumping 」 Output value ;
- Functions are differentiable . It means Can find any two points sigmoid The slope of the curve ;
- A clear prediction , That is very close to 1 or 0.
Sigmoid Shortcomings of activation function :
- The gradient disappears : Be careful :Sigmoid Function approach 0 and 1 The rate of change will flatten out , in other words ,Sigmoid The gradient of is close to 0. Neural networks use Sigmoid When the function is activated for back propagation , Output close to 0 or 1 The gradient of the neurons is close to 0. These neurons are called Saturated neurons . therefore , The weights of these neurons don't update . Besides , The weights of the neurons connected to these neurons are also updated very slowly . The problem is called gradient disappearance . therefore , Imagine , If a large neural network contains Sigmoid Neuron , And many of them are saturated , Then the network cannot perform back propagation .
- Not zero centered :Sigmoid The output is not zero centered ,, Output constant greater than 0, The non-zero centered output will bias the input of the next layer of neurons (Bias Shift), And further make the convergence speed of gradient descent slow down .
- It's expensive to calculate :exp() Compared with other nonlinear activation functions , It's expensive to calculate , The computer runs slowly .
2. Tanh/ Hyperbolic tangent activation function
Tanh Activation function is also called hyperbolic tangent activation function (hyperbolic tangent activation function). And Sigmoid Function similar to ,Tanh The function also uses the truth value , but Tanh The function compresses it to -1 To 1 Within the range of . And Sigmoid Different ,Tanh The output of the function is zero centered , Because the interval is -1 To 1 Between .
Function expression :
f ( x ) = t a n h ( x ) = e x − e − x e x + e − x = 2 1 + e − 2 x − 1 f_{(x)}=tanh(x) = \frac {e^x - e^{-x}} {e^x + e^{-x}} = \frac {2} {1 + e^{-2x}} -1 f(x)=tanh(x)=ex+e−xex−e−x=1+e−2x2−1
We can find out Tanh The function can be seen as zooming in and translating Logistic function , Its value range is (−1, 1).Tanh And sigmoid The relationship is as follows :
t a n h ( x ) = 2 s i g m o i d ( 2 x ) − 1 tanh(x) = 2sigmoid(2x) -1 tanh(x)=2sigmoid(2x)−1
tanh The image of the activation function is also S shape , As a hyperbolic tangent function ,tanh Functions and sigmoid The curves of functions are relatively similar . But it's better than sigmoid Functions have some advantages .
You can take Tanh Think of the function as two Sigmoid Functions together . In practice ,Tanh Functions take precedence over Sigmoid function . Negative inputs are treated as negative values , The mapping of zero input values is close to zero , Positive input is treated as positive :
- When the input is large or small , The output is almost smooth and the gradient is small , This is not conducive to weight update . The difference between the two is the output gap ,tanh The output interval of is 1, And the whole function takes 0 Centered , Than sigmoid Functions are better ;
- stay tanh In the figure , Negative input will be strongly mapped to negative , And zero input is mapped to near zero .
tanh The inadequacies of being :
- And sigmoid similar ,Tanh The function also has the problem of the disappearance of the gradient , So at saturation (x Very large or very small ) It will be 「 Kill 」 gradient .
Be careful : In the general binary classification problem ,tanh Function is used to hide layers , and sigmoid Function for the output layer , But it's not fixed , It needs to be adjusted to specific problems .
3. ReLU Activation function
ReLU Function is also called modified linear element (Rectified Linear Unit), It's a piecewise linear function , It makes up for sigmoid Function and tanh Gradient vanishing problem of function , It is widely used in the current deep neural network .ReLU A function is essentially a ramp (ramp) function , The formula and function image are as follows :
4. Leaky ReLU
5. Parametric ReLU Activation function
6. ELU Activation function
7. SeLU Activation function
8. Softmax Activation function
Softmax Is an activation function for multi class classification problems , In the multi class classification problem , More than two class tags require class membership . For length is K Any real vector of ,Softmax It can be compressed to a length of K, Values in (0,1) Within the scope of , And the sum of the elements in the vector is 1 The real vector of .
The function expression is as follows :
S i = e i ∑ j ∈ g r o u p e j S_i = \frac {e^i} {\sum_{ {j\in group}}{e^j}} Si=∑j∈groupejei

Softmax And normal max Functions are different :max The function only outputs the maximum value , but == Softmax Make sure that smaller values have smaller probabilities , And they don't just throw it away .== We can think of it as argmax The probability version of the function or 「soft」 edition .
Softmax The denominator of the function combines all the factors of the original output value , It means Softmax The probabilities obtained by functions are related to each other .
Softmax Deficiency of activation function :
- It's nondifferentiable at zero ;
- The gradient of negative input is zero , This means that for the activation of the region , Weights are not updated during back propagation , So it produces dead neurons that never activate .
9. Swish Activation function
10. Maxout Activation function
11. Softplus Activation function
边栏推荐
- Promise asynchronous programming
- Shell script -read command: read data entered from the keyboard
- It technology ebook collection
- 【ESP 保姆级教程 预告】疯狂Node.js服务器篇 ——案例:ESP8266 + DS18B20温度传感器 +NodeJs本地服务+ MySQL数据库
- 猿人学第20题(题目会不定时更新)
- Yidian Yidong helps enterprises to efficiently manage equipment and improve equipment utilization
- Why is the Ltd independent station a Web3.0 website!
- Nacos - 配置管理
- Shell脚本-变量的定义、赋值和删除
- Dynamic proxy
猜你喜欢
随机推荐
集团公司固定资产管理的痛点和解决方案
Nacos - 配置管理
Daily office consumables management solution
又到年中,固定资产管理该何去何从?
如何一站式高效管理固定资产?
Shell脚本-位置参数(命令行参数)
Shell script - definition, assignment and deletion of variables
安装Oracle EE
Redis source code learning (29), compressed list learning, ziplist C (II)
类加载
AVL树的理解和实现
【pytorch】softmax函数
美团2022年机试
Shell script -if else statement
R语言观察日志(part24)--初始化设置
What are the differences between the architecture a, R and m of arm V7, and in which fields are they applied?
Win7 pyinstaller reports an error DLL load failed while importing after packaging exe_ Socket: parameter error
C language student information management system
NiO zero copy
[ESP nanny level tutorial preview] crazy node JS server - Case: esp8266 + DHT11 +nodejs local service + MySQL database








