当前位置:网站首页>Activation function - relu vs sigmoid
Activation function - relu vs sigmoid
2022-07-02 20:22:00 【Zi Yan Ruoshui】
Data flow through sigmoid after , There will be significant attenuation .
Hypothetical front face w Make a big change , after sigmoid Then it will become a small change . This change has been transmitted back attenuation , Until
. At this time, you will find the front layer
Obviously smaller than the following
.
If you use the gradient descent method , The latter parameters must iterate faster than the previous parameters , So convergence is faster . As a result, the training of the following parameters is almost completed , The previous parameters are still close to the bad training results of random numbers .
therefore ML Search for alternatives sigmoid The activation function of , Such as relu.
relu Function in Greater than 0 Part of The gradient is constant ,relu Function in Less than 0 At the time of the Derivative is 0 , So once the neuron activation value enters the negative half region , Then the gradient will be 0, In other words, this neuron will not undergo training . Only the neuron activation value enters the positive half area , There will be a gradient value , At this point, the neuron will do this once ( To strengthen ) Training .
relu The nature of the function is very similar to the activation of neurons in Biology .
To sum up relu Characteristics as activation function :
1) Fast calculation ;
2) It simulates the activation characteristics of biological nervous system
3) A series of relu With different bias After superposition, it can be combined into sigmoid;
4) Solved the problem of gradient disappearance
边栏推荐
- 【实习】解决请求参数过长问题
- 自动化制作视频
- RPD product: super power squad nanny strategy
- pytorch 模型保存的完整例子+pytorch 模型保存只保存可训练参数吗?是(+解决方案)
- c语言链表--待补充
- Istio deployment: quickly start microservices,
- for(auto a : b)和for(auto &a : b)用法
- [译]深入了解现代web浏览器(一)
- AcWing 340. Solution to communication line problem (binary + double ended queue BFS for the shortest circuit)
- Driverless learning (III): Kalman filter
猜你喜欢
Outsourcing for three years, abandoned
Why do I have a passion for process?
有时候只查询一行语句,执行也慢
Common problems and description of kt148a voice chip IC development
[internship] solve the problem of too long request parameters
Istio1.12: installation and quick start
蓝牙芯片ble是什么,以及该如何选型,后续技术发展的路径是什么
Basic concept of database, installation and configuration of database, basic use of MySQL, operation of database in the project
台湾SSS鑫创SSS1700替代Cmedia CM6533 24bit 96KHZ USB音频编解码芯片
Spark source code compilation, cluster deployment and SBT development environment integration in idea
随机推荐
Taiwan SSS Xinchuang sss1700 replaces cmmedia cm6533 24bit 96KHz USB audio codec chip
CRM客户关系管理系统
RPD product: super power squad nanny strategy
Wu Enda's machine learning mind mapping insists on clocking in for 23 days - building a knowledge context, reviewing, summarizing and replying
Google Earth engine (GEE) - Landsat 9 image full band image download (Beijing as an example)
数据库模式笔记 --- 如何在开发中选择合适的数据库+关系型数据库是谁发明的?
AcWing 1126. Minimum cost solution (shortest path Dijkstra)
【Hot100】23. 合并K个升序链表
Spark source code compilation, cluster deployment and SBT development environment integration in idea
Automatic reading of simple books
AcWing 1135. Happy New Year (shortest path + search)
自动化制作视频
Share several map bed websites for everyone to share pictures
upload-labs
An analysis of the past and present life of the meta universe
Codeforces round 651 (Div. 2) (a thinking, B thinking, C game, D dichotomy, e thinking)
AcWing 903. Expensive bride price solution (the shortest path - building map, Dijkstra)
Driverless learning (III): Kalman filter
Use graalvm native image to quickly expose jar code as a native shared library
Istio1.12: installation and quick start