当前位置:网站首页>Mish shake the new successor of the deep learning relu activation function
Mish shake the new successor of the deep learning relu activation function
2022-07-02 11:51:00 【A ship that wants to learn】
The study of activation function has never stopped ,ReLU Or the activation function that governs deep learning , however , This situation may be Mish change .
Diganta Misra One of them is entitled “Mish: A Self Regularized Non-Monotonic Neural Activation Function” This paper introduces a new deep learning activation function , This function is more accurate than Swish(+.494%) and ReLU(+ 1.671%) All have improved .
Their small FastAI The team used Mish Instead of ReLU, Broke before in FastAI Part of the accuracy score record on the global leaderboard . combination Ranger Optimizer ,Mish Activate ,Flat + Cosine Annealing and self attention layer , They can get 12 New leaderboard records !
We 12 Items in the leaderboard record 6 term . Every record uses Mish instead of ReLU.( Blue highlights ,400 epoch The accuracy of is 94.6, Slightly higher than our 20 epoch The accuracy of is 93.8:)
As part of their own tests , about ImageWoof Data sets 5 epoch test , They say :
Mish It is superior to ReLU (P < 0.0001).(FastAI Forum @ Seb)
Mish Already in 70 Tested on multiple benchmarks , Including image classification 、 Segmentation and generation , And with others 15 Three activation functions are compared .
What is? Mesh
Look directly at Mesh The code will be simpler , Just to summarize ,Mish=x * tanh(ln(1+e^x)).
Other activation functions ,ReLU yes x = max(0,x),Swish yes x * sigmoid(x).
PyTorch Of Mish Realization :
Tensorflow Medium Mish function :
Tensorflow:x = x *tf.math.tanh(F.softplus(x))
Mish How does it compare with other activation functions ?
The image below shows Mish Test results with some other activation functions . This is as much as 73 The result of a test , In different architectures , On different tasks :
Why? Mish Behave so well ?
There are no boundaries ( That is, a positive value can reach any height ) Avoid saturation due to capping . Theoretically, a slight allowance for negative values allows for better gradient flow , Not like it ReLU A hard zero boundary like in .
Last , Maybe the most important , The current idea is , The smooth activation function allows better information to penetrate the neural network , So as to get better accuracy and generalization .
For all that , I tested many activation functions , They also satisfy many of these ideas , But most of them cannot be implemented . The main difference here may be Mish The smoothness of the function at almost all points on the curve .
Such passage Mish The ability to activate curve smoothness to push information is shown in the following figure , In a simple test of this article , More and more layers are added to a test neural network , There is no unified function . As the layer depth increases ,ReLU The precision drops rapidly , The second is Swish. by comparison ,Mish It can better maintain accuracy , This may be because it can better spread information :
Smoother activation allows information to flow more deeply …… Be careful , As the number of layers increases ,ReLU Rapid descent .
How to make Mish Put it on your own network ?
Mish Of PyTorch and FastAI The source code of can be found in github Two places to find :
1、 official Mish github:https://github.com/digantamisra98/Mish
2、 Unofficial Mish Use inline Speed up :https://github.com/lessw2020/mish
summary
ReLU There are some known weaknesses , But usually it's very light , And very light in calculation .Mish It has a strong theoretical origin , In the test , In terms of training stability and accuracy ,Mish The average performance of is better than ReLU.
The complexity is only slightly increased (V100 GPU and Mish, be relative to ReLU, Every time epoch Increase by about 1 second ), Considering the improvement of training stability and final accuracy , It seems worthwhile to add a little more time .
Final , After testing a large number of new activation functions this year ,Mish Take the lead in this regard , Many people suspect that it is likely to become AI New in the future ReLU.
English article address :https://medium.com/@lessw/meet-mish-new-state-of-the-art-ai-activation-function-the-successor-to-relu-846a6d93471f
边栏推荐
- 基于Hardhat和Openzeppelin开发可升级合约(一)
- excel表格中选中单元格出现十字带阴影的选中效果
- Take you ten days to easily finish the finale of go micro services (distributed transactions)
- HOW TO ADD P-VALUES TO GGPLOT FACETS
- easyExcel和lombok注解以及swagger常用注解
- 电脑无缘无故黑屏,无法调节亮度。
- 6方面带你认识LED软膜屏 LED软膜屏尺寸|价格|安装|应用
- map集合赋值到数据库
- Power Spectral Density Estimates Using FFT---MATLAB
- 由粒子加速器产生的反中子形成的白洞
猜你喜欢

HOW TO EASILY CREATE BARPLOTS WITH ERROR BARS IN R

Digital transformation takes the lead to resume production and work, and online and offline full integration rebuilds business logic

ESP32音频框架 ESP-ADF 添加按键外设流程代码跟踪

Principle of scalable contract delegatecall
![[visual studio 2019] create and import cmake project](/img/51/6c2575030c5103aee6c02bec8d5e77.jpg)
[visual studio 2019] create and import cmake project

Seriation in R: How to Optimally Order Objects in a Data Matrice

Mmrotate rotation target detection framework usage record

HOW TO CREATE A BEAUTIFUL INTERACTIVE HEATMAP IN R

2022年遭“挤爆”的三款透明LED显示屏

Seriation in R: How to Optimally Order Objects in a Data Matrice
随机推荐
mysql链表数据存储查询排序问题
HOW TO EASILY CREATE BARPLOTS WITH ERROR BARS IN R
HOW TO CREATE AN INTERACTIVE CORRELATION MATRIX HEATMAP IN R
制造业数字化转型和精益生产什么关系
Is the stock account given by qiniu business school safe? Can I open an account?
可昇級合約的原理-DelegateCall
CTF record
Tdsql | difficult employment? Tencent cloud database micro authentication to help you
map集合赋值到数据库
Cmake cross compilation
ESP32音频框架 ESP-ADF 添加按键外设流程代码跟踪
How to Create a Beautiful Plots in R with Summary Statistics Labels
YYGH-10-微信支付
Power Spectral Density Estimates Using FFT---MATLAB
基于 Openzeppelin 的可升级合约解决方案的注意事项
Precautions for scalable contract solution based on openzeppelin
BEAUTIFUL GGPLOT VENN DIAGRAM WITH R
The position of the first underline selected by the vant tabs component is abnormal
FLESH-DECT(MedIA 2021)——一个material decomposition的观点
Develop scalable contracts based on hardhat and openzeppelin (II)



