当前位置:网站首页>Activate function
Activate function
2022-07-28 07:20:00 【Sauerkraut】
Why? tanh Than Sigmoid Convergence is faster ?
1. Zuo Wei Sigmoid:
Sigmoid Is a commonly used nonlinear activation function , His mathematical form is as follows :
Picture description here
It can put the continuous real value of input ” Compress “ To 0-1 Between .
Special , If it's a very large negative number , So the output is 0; If it's a very large positive number , Output is 1.
Sigmoid Functions have been used a lot , But now , Fewer and fewer people use it . Mainly because of some of its shortcoming :
(1)Sigmoids saturate and kill gradients, This is the gradient vanishing problem we often mention .sigmoid There is a very fatal flaw , When the input is very large or very small (saturation), The gradient of these neurons is close to 0 Of . If your initial value is very large , Most neurons may be in saturation And put gradient kill fall , This will make the Internet difficult to learn .
(2)Sigmoid Of output No 0 mean value . This is not desirable , Because this will cause the neurons in the latter layer to get the non output of the upper layer 0 Mean signal as input .
2. Right for tanh;
tanh yes Sigmoid Deformation of :
Picture description here
And Sigmoid The difference is ,tanh yes 0 Mean . therefore , Practical application ,tanh than Sigmoid Better .
The corresponding derivative :
Picture description here
Picture description here
You know , The range of is (0,1)
The range of is (0,1/4).
Sum up ,tanh(x) Gradient vanishing problem ratio Sigmoid Be light , So convergence should be fast .
————————————————
Copyright notice : This paper is about CSDN Blogger 「Peanut_ Fan 」 The original article of , follow CC 4.0 by-sa Copyright agreement , For reprint, please attach the original source link and this statement .
Link to the original text :https://blog.csdn.net/u013841196/article/details/80473654
边栏推荐
- Understanding of maximum likelihood estimation, gradient descent, linear regression and logistic regression
- Install pycharm
- “核弹级” Log4j 漏洞仍普遍存在,并造成持续影响
- Continous Gesture Recognition with hand-orented spatiotemporal feature
- Implementation method of converting ast into word vector before converting word vector
- 低端电脑如何深度学习秘籍-使用mistGPU计算平台
- Serial port configuration of raspberry pie
- MySQL queries all descendant nodes under the parent node. When querying the user list, it is processed by multi-level (company) departments. According to reflection, it recurses the tree structure too
- Shell--- sed statement exercise
- uniapp 移动端 两种横竖屏切换方案
猜你喜欢

Current limiting ratelimiter of guava

MySQL queries all descendant nodes under the parent node. When querying the user list, it is processed by multi-level (company) departments. According to reflection, it recurses the tree structure too

The.Joernindex database has no content after Joern runs

Freemaker exports word with tables and multiple pictures to solve the repetition and deformation of pictures

面试中必不可少的性能优化专题~

OJ questions about fast and slow pointers in linked lists

高性能内存队列-Disruptor

Leetcode then a deep copy of the linked list

Serial port configuration of raspberry pie

Shell--第一天作业
随机推荐
深入剖析单例模式的实现
Earliest deadline first (EDF)
Shell --- conditional statement practice
232 (female) to 422 (male)
2018-cvpr-Gesture Recognition: Focus on the Hands
VCF file production
面试中必不可少的性能优化专题~
Easypoi export interlaced style settings
Use of C3d
uniapp项目怎么连接手机真机调试
Install pycharm
TOPK problem
C language address book system
List of papers on gestures
Neo4j running error occurred during initialization of VM incompatible minimum and maximum heap sizes spec
Multiprocessing (multiprocessing)
Nrf51822 review summary
Log in to Oracle10g OEM and want to manage the monitor program, but the account password input page always pops up
vcf文件制作
Redis主从复制原理及配置