当前位置:网站首页>2.4 activation function
2.4 activation function
2022-07-01 09:08:00 【Enzo tried to smash the computer】
Catalog
- One 、 Activation function
- Two 、 Common activation functions
- 1. Sigmoid function
- 2. Tanh/ Hyperbolic tangent activation function
- 3. ReLU Activation function
- 4. Leaky ReLU
- 5. Parametric ReLU Activation function
- 6. ELU Activation function
- 7. SeLU Activation function
- 8. Softmax Activation function
- 9. Swish Activation function
- 10. Maxout Activation function
- 11. Softplus Activation function
One 、 Activation function
Activation functions have many functions in neural networks , Its main function is to provide the neural network with Nonlinear modeling ability . If there is no activation function , Then the multilayer neural network can only deal with linear separable problems .
therefore , Neural networks use activation functions to add nonlinear factors , Improve the expression ability of the model
Two 、 Common activation functions
1. Sigmoid function
Sigmoid The function is also called Logistic function , For hidden layer neuron output , The value range is (0,1), It maps a real number to (0,1) The range of , It can be used for secondary classification . The effect is better when the feature difference is more complex or the difference is not particularly large .sigmoid Is a very common activation function , The expression of the function is as follows :
f ( x ) = 1 1 + e − x f_{(x)} =\frac 1 {1+e^{-x}} f(x)=1+e−x1
The image is similar to a S Shape curve 
Under what circumstances is it suitable to use Sigmoid What about the activation function ?
- Sigmoid The output range of the function is 0 To 1. Because the output value is limited to 0 To 1, So it makes a comparison of the output of each neuron normalization ;
- Is used to Forecast probability as output Model of . Because the range of probability is 0 To 1, therefore Sigmoid The function fits perfectly ;
- Gradient smoothing , avoid 「 jumping 」 Output value ;
- Functions are differentiable . It means Can find any two points sigmoid The slope of the curve ;
- A clear prediction , That is very close to 1 or 0.
Sigmoid Shortcomings of activation function :
- The gradient disappears : Be careful :Sigmoid Function approach 0 and 1 The rate of change will flatten out , in other words ,Sigmoid The gradient of is close to 0. Neural networks use Sigmoid When the function is activated for back propagation , Output close to 0 or 1 The gradient of the neurons is close to 0. These neurons are called Saturated neurons . therefore , The weights of these neurons don't update . Besides , The weights of the neurons connected to these neurons are also updated very slowly . The problem is called gradient disappearance . therefore , Imagine , If a large neural network contains Sigmoid Neuron , And many of them are saturated , Then the network cannot perform back propagation .
- Not zero centered :Sigmoid The output is not zero centered ,, Output constant greater than 0, The non-zero centered output will bias the input of the next layer of neurons (Bias Shift), And further make the convergence speed of gradient descent slow down .
- It's expensive to calculate :exp() Compared with other nonlinear activation functions , It's expensive to calculate , The computer runs slowly .
2. Tanh/ Hyperbolic tangent activation function
Tanh Activation function is also called hyperbolic tangent activation function (hyperbolic tangent activation function). And Sigmoid Function similar to ,Tanh The function also uses the truth value , but Tanh The function compresses it to -1 To 1 Within the range of . And Sigmoid Different ,Tanh The output of the function is zero centered , Because the interval is -1 To 1 Between .
Function expression :
f ( x ) = t a n h ( x ) = e x − e − x e x + e − x = 2 1 + e − 2 x − 1 f_{(x)}=tanh(x) = \frac {e^x - e^{-x}} {e^x + e^{-x}} = \frac {2} {1 + e^{-2x}} -1 f(x)=tanh(x)=ex+e−xex−e−x=1+e−2x2−1
We can find out Tanh The function can be seen as zooming in and translating Logistic function , Its value range is (−1, 1).Tanh And sigmoid The relationship is as follows :
t a n h ( x ) = 2 s i g m o i d ( 2 x ) − 1 tanh(x) = 2sigmoid(2x) -1 tanh(x)=2sigmoid(2x)−1
tanh The image of the activation function is also S shape , As a hyperbolic tangent function ,tanh Functions and sigmoid The curves of functions are relatively similar . But it's better than sigmoid Functions have some advantages .
You can take Tanh Think of the function as two Sigmoid Functions together . In practice ,Tanh Functions take precedence over Sigmoid function . Negative inputs are treated as negative values , The mapping of zero input values is close to zero , Positive input is treated as positive :
- When the input is large or small , The output is almost smooth and the gradient is small , This is not conducive to weight update . The difference between the two is the output gap ,tanh The output interval of is 1, And the whole function takes 0 Centered , Than sigmoid Functions are better ;
- stay tanh In the figure , Negative input will be strongly mapped to negative , And zero input is mapped to near zero .
tanh The inadequacies of being :
- And sigmoid similar ,Tanh The function also has the problem of the disappearance of the gradient , So at saturation (x Very large or very small ) It will be 「 Kill 」 gradient .
Be careful : In the general binary classification problem ,tanh Function is used to hide layers , and sigmoid Function for the output layer , But it's not fixed , It needs to be adjusted to specific problems .
3. ReLU Activation function
ReLU Function is also called modified linear element (Rectified Linear Unit), It's a piecewise linear function , It makes up for sigmoid Function and tanh Gradient vanishing problem of function , It is widely used in the current deep neural network .ReLU A function is essentially a ramp (ramp) function , The formula and function image are as follows :
4. Leaky ReLU
5. Parametric ReLU Activation function
6. ELU Activation function
7. SeLU Activation function
8. Softmax Activation function
Softmax Is an activation function for multi class classification problems , In the multi class classification problem , More than two class tags require class membership . For length is K Any real vector of ,Softmax It can be compressed to a length of K, Values in (0,1) Within the scope of , And the sum of the elements in the vector is 1 The real vector of .
The function expression is as follows :
S i = e i ∑ j ∈ g r o u p e j S_i = \frac {e^i} {\sum_{ {j\in group}}{e^j}} Si=∑j∈groupejei

Softmax And normal max Functions are different :max The function only outputs the maximum value , but == Softmax Make sure that smaller values have smaller probabilities , And they don't just throw it away .== We can think of it as argmax The probability version of the function or 「soft」 edition .
Softmax The denominator of the function combines all the factors of the original output value , It means Softmax The probabilities obtained by functions are related to each other .
Softmax Deficiency of activation function :
- It's nondifferentiable at zero ;
- The gradient of negative input is zero , This means that for the activation of the region , Weights are not updated during back propagation , So it produces dead neurons that never activate .
9. Swish Activation function
10. Maxout Activation function
11. Softplus Activation function
边栏推荐
- Shell脚本-case in 和正则表达式
- 【pytorch】nn.AdaptiveMaxPool2d
- 动态代理
- 树结构---二叉树2非递归遍历
- Installing Oracle EE
- Graduation season, I want to tell you
- Shell script - definition, assignment and deletion of variables
- Jeecg restart alarm 40001
- Embedded Engineer Interview Question 3 Hardware
- 【ESP 保姆级教程】疯狂毕设篇 —— 案例:基于阿里云和Arduino的化学环境系统检测,支持钉钉机器人告警
猜你喜欢
随机推荐
【ESP 保姆级教程 预告】疯狂Node.js服务器篇 ——案例:ESP8266 + DS18B20温度传感器 +NodeJs本地服务+ MySQL数据库
【检测技术课案】简易数显电子秤的设计与制作
大型工厂设备管理痛点和解决方案
[ESP nanny level tutorial preview] crazy node JS server - Case: esp8266 + DHT11 +nodejs local service + MySQL database
FreeRTOS learning easy notes
【ESP 保姆级教程 预告】疯狂Node.js服务器篇 ——案例:ESP8266 + DHT11 +NodeJs本地服务+ MySQL数据库
How to manage fixed assets well? Easy to point and move to provide intelligent solutions
动态代理
Shell script -read command: read data entered from the keyboard
I would like to know the process of stock registration and account opening by mobile phone? In addition, is it safe to open a mobile account?
【pytorch】softmax函数
It is designed with high bandwidth, which is almost processed into an open circuit?
MySQL optimization
JCL 和 SLF4J
【ESP 保姆级教程】疯狂毕设篇 —— 案例:基于阿里云、小程序、Arduino的WS2812灯控系统
Bird recognition app
Meituan machine test in 2022
Microcomputer principle - bus and its formation
nacos服务配置和持久化配置
Insert mathematical formula in MD document and mathematical formula in typora









![[MFC development (16)] tree control](/img/b9/1de4330c0bd186cfe062b02478c058.png)