当前位置:网站首页>Activation function - relu vs sigmoid
Activation function - relu vs sigmoid
2022-07-02 20:22:00 【Zi Yan Ruoshui】
Data flow through sigmoid after , There will be significant attenuation .
Hypothetical front face w Make a big change
, after sigmoid Then it will become a small change . This change has been transmitted back attenuation , Until
. At this time, you will find the front layer
Obviously smaller than the following
.
If you use the gradient descent method , The latter parameters must iterate faster than the previous parameters , So convergence is faster . As a result, the training of the following parameters is almost completed , The previous parameters are still close to the bad training results of random numbers .

therefore ML Search for alternatives sigmoid The activation function of , Such as relu.

relu Function in Greater than 0 Part of The gradient is constant ,relu Function in Less than 0 At the time of the Derivative is 0 , So once the neuron activation value enters the negative half region , Then the gradient will be 0, In other words, this neuron will not undergo training . Only the neuron activation value enters the positive half area , There will be a gradient value , At this point, the neuron will do this once ( To strengthen ) Training .
relu The nature of the function is very similar to the activation of neurons in Biology .

To sum up relu Characteristics as activation function :
1) Fast calculation ;
2) It simulates the activation characteristics of biological nervous system
3) A series of relu With different bias After superposition, it can be combined into sigmoid;
4) Solved the problem of gradient disappearance
边栏推荐
- Detailed explanation of VBScript (I)
- 现在券商的优惠开户政策什么?实际上网上开户安全么?
- AcWing 903. Expensive bride price solution (the shortest path - building map, Dijkstra)
- What are the benefits of multi terminal applet development? Covering Baidu applet, Tiktok applet, wechat applet development, and seizing the multi platform traffic dividend
- KT148A语音芯片ic的软件参考代码C语言,一线串口
- Driverless learning (4): Bayesian filtering
- Resunet tensorrt8.2 speed and video memory record table on Jetson Xavier NX (continuously supplemented later)
- 分享几个图床网址,便于大家分享图片
- Kt148a voice chip IC user end self replacement voice method, upper computer
- Use graalvm native image to quickly expose jar code as a native shared library
猜你喜欢

KT148A语音芯片ic的软件参考代码C语言,一线串口

Resunet tensorrt8.2 speed and video memory record table on Jetson Xavier NX (continuously supplemented later)

Taiwan SSS Xinchuang sss1700 replaces cmmedia cm6533 24bit 96KHz USB audio codec chip

自动生成VGG图像注释文件

蓝牙芯片ble是什么,以及该如何选型,后续技术发展的路径是什么

勵志!大凉山小夥全獎直博!論文致謝看哭網友

After 65 days of closure and control of the epidemic, my home office experience sharing | community essay solicitation

RPD出品:Superpower Squad 保姆级攻略

B-end e-commerce - reverse order process

Cs5268 perfectly replaces ag9321mcq typec multi in one docking station solution
随机推荐
[source code analysis] model parallel distributed training Megatron (5) -- pipestream flush
Cs5268 perfectly replaces ag9321mcq typec multi in one docking station solution
笔记本安装TIA博途V17后出现蓝屏的解决办法
攻防世界pwn题:Recho
What is online account opening? Is it safe to open an account online now?
Dictionaries
CheckListBox control usage summary
Attack and defense world PWN question: Echo
【NLP】一文详解生成式文本摘要经典论文Pointer-Generator
Burp install license key not recognized
测试人员如何做不漏测?这7点就够了
证券如何在线开户?手机开户是安全么?
Cron expression (seven subexpressions)
Kt148a voice chip instructions, hardware, protocols, common problems, and reference codes
How can testers do without missing tests? Seven o'clock is enough
at编译环境搭建-win
What are the preferential account opening policies of securities companies now? Is it actually safe to open an account online?
【Hot100】22. bracket-generating
Automated video production
Sometimes only one line of statements are queried, and the execution is slow