当前位置:网站首页>2.4 activation function
2.4 activation function
2022-07-01 09:08:00 【Enzo tried to smash the computer】
Catalog
- One 、 Activation function
- Two 、 Common activation functions
- 1. Sigmoid function
- 2. Tanh/ Hyperbolic tangent activation function
- 3. ReLU Activation function
- 4. Leaky ReLU
- 5. Parametric ReLU Activation function
- 6. ELU Activation function
- 7. SeLU Activation function
- 8. Softmax Activation function
- 9. Swish Activation function
- 10. Maxout Activation function
- 11. Softplus Activation function
One 、 Activation function
Activation functions have many functions in neural networks , Its main function is to provide the neural network with Nonlinear modeling ability . If there is no activation function , Then the multilayer neural network can only deal with linear separable problems .
therefore , Neural networks use activation functions to add nonlinear factors , Improve the expression ability of the model
Two 、 Common activation functions
1. Sigmoid function
Sigmoid The function is also called Logistic function , For hidden layer neuron output , The value range is (0,1), It maps a real number to (0,1) The range of , It can be used for secondary classification . The effect is better when the feature difference is more complex or the difference is not particularly large .sigmoid Is a very common activation function , The expression of the function is as follows :
f ( x ) = 1 1 + e − x f_{(x)} =\frac 1 {1+e^{-x}} f(x)=1+e−x1
The image is similar to a S Shape curve 
Under what circumstances is it suitable to use Sigmoid What about the activation function ?
- Sigmoid The output range of the function is 0 To 1. Because the output value is limited to 0 To 1, So it makes a comparison of the output of each neuron normalization ;
- Is used to Forecast probability as output Model of . Because the range of probability is 0 To 1, therefore Sigmoid The function fits perfectly ;
- Gradient smoothing , avoid 「 jumping 」 Output value ;
- Functions are differentiable . It means Can find any two points sigmoid The slope of the curve ;
- A clear prediction , That is very close to 1 or 0.
Sigmoid Shortcomings of activation function :
- The gradient disappears : Be careful :Sigmoid Function approach 0 and 1 The rate of change will flatten out , in other words ,Sigmoid The gradient of is close to 0. Neural networks use Sigmoid When the function is activated for back propagation , Output close to 0 or 1 The gradient of the neurons is close to 0. These neurons are called Saturated neurons . therefore , The weights of these neurons don't update . Besides , The weights of the neurons connected to these neurons are also updated very slowly . The problem is called gradient disappearance . therefore , Imagine , If a large neural network contains Sigmoid Neuron , And many of them are saturated , Then the network cannot perform back propagation .
- Not zero centered :Sigmoid The output is not zero centered ,, Output constant greater than 0, The non-zero centered output will bias the input of the next layer of neurons (Bias Shift), And further make the convergence speed of gradient descent slow down .
- It's expensive to calculate :exp() Compared with other nonlinear activation functions , It's expensive to calculate , The computer runs slowly .
2. Tanh/ Hyperbolic tangent activation function
Tanh Activation function is also called hyperbolic tangent activation function (hyperbolic tangent activation function). And Sigmoid Function similar to ,Tanh The function also uses the truth value , but Tanh The function compresses it to -1 To 1 Within the range of . And Sigmoid Different ,Tanh The output of the function is zero centered , Because the interval is -1 To 1 Between .
Function expression :
f ( x ) = t a n h ( x ) = e x − e − x e x + e − x = 2 1 + e − 2 x − 1 f_{(x)}=tanh(x) = \frac {e^x - e^{-x}} {e^x + e^{-x}} = \frac {2} {1 + e^{-2x}} -1 f(x)=tanh(x)=ex+e−xex−e−x=1+e−2x2−1
We can find out Tanh The function can be seen as zooming in and translating Logistic function , Its value range is (−1, 1).Tanh And sigmoid The relationship is as follows :
t a n h ( x ) = 2 s i g m o i d ( 2 x ) − 1 tanh(x) = 2sigmoid(2x) -1 tanh(x)=2sigmoid(2x)−1
tanh The image of the activation function is also S shape , As a hyperbolic tangent function ,tanh Functions and sigmoid The curves of functions are relatively similar . But it's better than sigmoid Functions have some advantages .
You can take Tanh Think of the function as two Sigmoid Functions together . In practice ,Tanh Functions take precedence over Sigmoid function . Negative inputs are treated as negative values , The mapping of zero input values is close to zero , Positive input is treated as positive :
- When the input is large or small , The output is almost smooth and the gradient is small , This is not conducive to weight update . The difference between the two is the output gap ,tanh The output interval of is 1, And the whole function takes 0 Centered , Than sigmoid Functions are better ;
- stay tanh In the figure , Negative input will be strongly mapped to negative , And zero input is mapped to near zero .
tanh The inadequacies of being :
- And sigmoid similar ,Tanh The function also has the problem of the disappearance of the gradient , So at saturation (x Very large or very small ) It will be 「 Kill 」 gradient .
Be careful : In the general binary classification problem ,tanh Function is used to hide layers , and sigmoid Function for the output layer , But it's not fixed , It needs to be adjusted to specific problems .
3. ReLU Activation function
ReLU Function is also called modified linear element (Rectified Linear Unit), It's a piecewise linear function , It makes up for sigmoid Function and tanh Gradient vanishing problem of function , It is widely used in the current deep neural network .ReLU A function is essentially a ramp (ramp) function , The formula and function image are as follows :
4. Leaky ReLU
5. Parametric ReLU Activation function
6. ELU Activation function
7. SeLU Activation function
8. Softmax Activation function
Softmax Is an activation function for multi class classification problems , In the multi class classification problem , More than two class tags require class membership . For length is K Any real vector of ,Softmax It can be compressed to a length of K, Values in (0,1) Within the scope of , And the sum of the elements in the vector is 1 The real vector of .
The function expression is as follows :
S i = e i ∑ j ∈ g r o u p e j S_i = \frac {e^i} {\sum_{ {j\in group}}{e^j}} Si=∑j∈groupejei

Softmax And normal max Functions are different :max The function only outputs the maximum value , but == Softmax Make sure that smaller values have smaller probabilities , And they don't just throw it away .== We can think of it as argmax The probability version of the function or 「soft」 edition .
Softmax The denominator of the function combines all the factors of the original output value , It means Softmax The probabilities obtained by functions are related to each other .
Softmax Deficiency of activation function :
- It's nondifferentiable at zero ;
- The gradient of negative input is zero , This means that for the activation of the region , Weights are not updated during back propagation , So it produces dead neurons that never activate .
9. Swish Activation function
10. Maxout Activation function
11. Softplus Activation function
边栏推荐
- Common interview questions for embedded engineers 2-mcu_ STM32
- Shell脚本-read命令:读取从键盘输入的数据
- 固定资产管理系统让企业动态掌握资产情况
- 【ESP 保姆级教程 预告】疯狂Node.js服务器篇 ——案例:ESP8266 + MQ系列 + NodeJs本地服务 + MySql存储
- 毕业季,我想对你说
- 【pytorch】nn.AdaptiveMaxPool2d
- FAQ | FAQ for building applications for large screen devices
- 中小企业固定资产管理办法哪种好?
- nacos簡易實現負載均衡
- Understanding and implementation of AVL tree
猜你喜欢

Principles of Microcomputer - internal and external structure of microprocessor

nacos服务配置和持久化配置

小鸟识别APP

Glitch free clock switching technology

Phishing identification app

I use flask to write the website "one"

Jetson nano installs tensorflow GPU and problem solving

What are the differences between the architecture a, R and m of arm V7, and in which fields are they applied?

Which method is good for the management of fixed assets of small and medium-sized enterprises?

Why is the Ltd independent station a Web3.0 website!
随机推荐
毕业季,我想对你说
Software Engineer Interview Question brushing website and experience method
【ESP 保姆级教程 预告】疯狂Node.js服务器篇 ——案例:ESP8266 + DS18B20温度传感器 +NodeJs本地服务+ MySQL数据库
Redis source code learning (29), compressed list learning, ziplist C (II)
类加载
[ESP nanny level tutorial preview] crazy node JS server - Case: esp8266 + MQ Series + nodejs local service + MySQL storage
Embedded Engineer Interview frequently asked questions
How to solve the problem of fixed assets management and inventory?
【ESP 保姆级教程 预告】疯狂Node.js服务器篇 ——案例:ESP8266 + DHT11 +NodeJs本地服务+ MySQL数据库
Pain points and solutions of equipment management in large factories
【pytorch】softmax函数
R language observation log (part24) -- initialization settings
【ESP 保姆级教程 预告】疯狂Node.js服务器篇 ——案例:ESP8266 + MQ系列 + NodeJs本地服务 + MySql存储
Is it safe to dig up money and make new shares
Meituan machine test in 2022
Nacos - 配置管理
【ESP 保姆级教程】疯狂毕设篇 —— 案例:基于阿里云、小程序、Arduino的WS2812灯控系统
Shell script -if else statement
The jar package embedded with SQLite database is deployed by changing directories on the same machine, and the newly added database records are gone
Shell script -for loop and for int loop