当前位置:网站首页>Linear rectification function relu and its variants in deep learning activation function
Linear rectification function relu and its variants in deep learning activation function
2022-07-03 02:31:00 【Python's path to immortality】
Linear rectification function ReLU
Linear rectification function (Rectified Linear Unit, ReLU), also called Modified linear element , It's a kind of Artificial neural network Activation functions commonly used in (activation function), Usually referred to as Slope function And its variants Nonlinear functions .
Mathematical expression :
f(x) = max(0, x)
Or write this :
In the above formula x Is the output value of neural network after linear transformation ,ReLU Convert the result of linear transformation into nonlinear value , This idea refers to the neural network mechanism in Biology , This mechanism is characterized by when the input is negative , Set all zeros , When the input is timing, it remains unchanged , This feature is called unilateral inhibition . In the hidden layer , This feature will bring certain sparsity to the output of the hidden layer . At the same time, because it is input as timing , The output remains the same , The gradient of 1:
- Advantages one : Simple calculation and high efficiency . Compared with other activation functions such as sigmoid、tanh, The derivative is easier to find , Back propagation is the process of constantly updating parameters , Because its derivative is not complex and its form is simple .
- Advantage two : The suppression gradient disappears . For deep networks , Other activation functions such as sigmoid、tanh Function back propagation , It's easy to see the gradient disappear ( stay sigmoid Near the saturation zone , Too slow to change , Derivative tends to 0, This can cause information loss .), This phenomenon is called saturation , So we can't complete the training of deep network .
and ReLU There will be no tendency to saturate ( This is only for the right end , The left derivative is zero , Once you fall in, the gradient will still disappear ), There won't be a very small gradient .
Advantage three : Ease of overfitting .Relu It's going to make the output of some of the neurons zero 0, This results in the sparsity of the network , Moreover, the interdependence of parameters is reduced , The problem of overfitting is alleviated
Shortcomings also exist relatively ,ReLU This unilateral inhibition mechanism is too rough and simple , In some cases, it may cause a neuron “ Death ”, That is, the elimination of the inhibition gradient emphasized in advantage 2 above is reflected at the right end , The left derivative is 0, Then in the process of back propagation , The corresponding gradient is always 0, As a result, effective updates cannot be made . To avoid that , There are several kinds of ReLU Variants of are also widely used .
Leaky ReLU
LeakyReLU differ ReLU Completely suppress when the input is negative , When the input is negative , A certain amount of information can be allowed to pass through , The specific method is when the input is negative , Output is , The mathematical expression is :
among Is a super parameter greater than zero , Usually the value is 0.2、0.01. This can be avoided ReLU Neurons appear “ Death ” The phenomenon .LeakyReLU The gradient of is as follows :
A great integrator ELU
The ideal activation function should satisfy two conditions :
- The distribution of the output is zero mean , Can speed up the training .
- The activation function is one-sided saturated , Can better converge .
LeakyReLU It is relatively close to meeting the 1 Conditions , Not satisfied with the 2 Conditions ; and ReLU Satisfy the 2 Conditions , Not satisfied with the 1 Conditions . The activation function satisfying both conditions is ELU(Exponential Linear Unit), The mathematical expression is :
Input greater than 0 The gradient of the part is 1, Input less than 0 Part of the infinite approaches -α.
ELU Integrated sigmoid and ReLU, Soft saturation on the left , There is no saturation on the right , But the nonlinear optimization on the left also brings the disadvantage of losing computing speed .
Reference resources : Understand the activation function (Sigmoid/ReLU/LeakyReLU/PReLU/ELU) - You know
边栏推荐
- [translation] the background project has joined the CNCF incubator
- QT qcombobox add qccheckbox (drop-down list box insert check box, including source code + comments)
- Gbase 8C system table PG_ cast
- GBase 8c系统表pg_cast
- The data in servlet is transferred to JSP page, and the problem cannot be displayed using El expression ${}
- Tongda OA V12 process center
- GBase 8c系统表-pg_auth_members
- Random Shuffle attention
- MUX VLAN Foundation
- 《MATLAB 神经网络43个案例分析》:第43章 神经网络高效编程技巧——基于MATLAB R2012b新版本特性的探讨
猜你喜欢
Random Shuffle attention
UDP receive queue and multiple initialization test
The data in servlet is transferred to JSP page, and the problem cannot be displayed using El expression ${}
Use go language to realize try{}catch{}finally
xiaodi-笔记
怎么将yolov5中的PANet层改为BiFPN
[shutter] setup of shutter development environment (supplement the latest information | the latest installation tutorial on August 25, 2021)
[fluent] JSON model conversion (JSON serialization tool | JSON manual serialization | writing dart model classes according to JSON | online automatic conversion of dart classes according to JSON)
Awk from introduction to earth (0) overview of awk
通达OA v12流程中心
随机推荐
Detailed introduction to the deployment and usage of the Nacos registry
Choose it when you decide
[codeforces] cf1338a - Powered addition [binary]
awk从入门到入土(2)认识awk内置变量和变量的使用
GBase 8c触发器(三)
Gbase 8C system table PG_ database
Principle and application of database
easyExcel
Distributed transaction solution
Gbase 8C function / stored procedure definition
Basic operation of binary tree (C language version)
GBase 8c系统表pg_database
[shutter] setup of shutter development environment (supplement the latest information | the latest installation tutorial on August 25, 2021)
通达OA 首页门户工作台
[shutter] bottom navigation bar page frame (bottomnavigationbar bottom navigation bar | pageview sliding page | bottom navigation and sliding page associated operation)
xiaodi-笔记
GBase 8c 函数/存储过程定义
Gbase 8C system table PG_ amproc
各国Web3现状与未来
返回一个树形结构数据