当前位置:网站首页>Function classification big PK! How to use sigmoid and softmax respectively?
Function classification big PK! How to use sigmoid and softmax respectively?
2020-11-08 16:17:00 【Spiritual】
Design models to perform classification tasks ( As for the chest X Just check the disease or handwritten number to classify ) when , Sometimes you need to choose multiple answers at the same time ( If you choose pneumonia and abscess at the same time ), Sometimes you can only choose one answer ( Like numbers “8”). This article will discuss how to apply Sigmoid Function or Softmax Function handles the original output value of the classifier .
There are many kinds of neural network classifier classification algorithms , But the content of this paper is limited to neural network classifier . The classification problem can be solved by different neural networks , Such as feedforward neural network and convolution neural network . application Sigmoid Function or Softmax The final result of FNN classifier is a vector , namely “ The original output value ”, Such as [-0.5, 1.2, -0.1, 2.4], These four outputs correspond to the chest X Pneumonia after light examination 、 Heart hypertrophy 、 Tumors and abscesses . But what do these raw output values mean ? It may be easier to understand by converting the output value to a probability . Compared with the seemingly casual “2.4”, The possibility of diabetes is 91%, This statement is easier for patients to understand .Sigmoid Function or Softmax Function can map the original output value of classifier to probability . The following figure shows the original output of the feedforward neural network ( Blue ) adopt Sigmoid Functions are mapped to probabilities ( Red ) The process of :
Then use Softmax Function repeats the above process :
As shown in the figure ,Sigmoid Functions and Softmax Function gives different results . The reason lies in ,Sigmoid The function processes the raw output values separately , So the results are independent of each other , The sum of probabilities is not necessarily 1, Pictured 0.37 + 0.77 + 0.48 + 0.91 = 2.53. contrary ,Softmax The output values of functions are related to each other , The sum of the probabilities is always 1, Pictured 0.04 + 0.21 + 0.05 + 0.70 = 1.00. therefore , stay Softmax Function , To increase the probability of a class , The probability of other categories must be reduced accordingly .
Sigmoid Function application : With the chest X X-ray examination and admission for example, chest X Photo chip : A chest X Light film can show many diseases at the same time , So the chest X X-ray classifiers also need to display multiple symptoms at the same time . Here is a chest showing pneumonia and abscess X Photo chip , In the tab bar on the right, there are two “1”:
be hospitalized : The goal is based on the patient's health record , Determine the possibility of the patient's admission in the future . therefore , The classification problem can be designed as : According to the diagnosis, the disease may lead to the patient's admission in the future ( If any ), Classify the patient's existing health records . There may be a variety of diseases leading to admission , So there may be more than one answer . Chart : The following two feedforward neural networks correspond to the above problems respectively . In the final calculation , from Sigmoid Function handles the original output value , Get the corresponding probability , Allow multiple possibilities to coexist —— Because of the chest X X-rays may reflect a variety of abnormal states , There may be more than one cause of admission .
Softmax Function application : With handwritten numbers and Iris( Iris ) For example, handwritten numbers : Distinguish between handwritten numbers (MNIST Data sets :https://en.wikipedia.org/wiki/MNIST_database) when , The classifier should use Softmax function , What kind of numbers are . After all , Numbers 8 It's just numbers 8, It can't be numbers at the same time 7.
Iris:Iris Data set in 1936 In introducing (https://en.wikipedia.org/wiki/Iris_flower_data_set), It includes 150 Data sets , Divided into iris 、 Variegated Iris 、 Iris Virginia 3 class , Each category has 50 Data sets , Each data contains calyx length 、 Calyx width 、 Petal length 、 Petal width 4 Attributes . following 9 An example is taken from Iris Data sets :
There are no images in the dataset , But here's the mottled iris (https://en.wikipedia.org/wiki/Iris_flower_data_set#/media/File:Iris_versicolor_3.jpg), For you to enjoy :
Iris Neural network classifier of data set , To adopt Softmax Function handles the original output value , Because a iris can only be a specific species —— There's no point in dividing it into several varieties .
About “e” We should understand that Sigmoid and Softmax function , We should introduce “e”. In this paper , Just need to know e It's about equal to 2.71828 The mathematical constant of . Here is about e Other information about :• e The decimal system means forever , The numbers appear completely random —— Be similar to pi.• e Often used in compound interest 、 In the study of gambling and some probability distributions .• Here is e A formula for :
but e There is more than one formula for . There are many ways to calculate it . For example :https://www.intmath.com/exponential-logarithmic-functions/calculating-e.php• 2004 year , Google's IPO reached 2,718,281,828 dollar , namely “e Million dollars ”.• Wikipedia is the famous decimal number in human history e The evolution of (https://en.wikipedia.org/wiki/E_%28mathematical_constant%29#Bernoulli_trials), from 1690 One digit of the year begins , Last until 1978 Year of 116,000 Digit number :
Sigmoid Functions and Softmax function Sigmoid = Multi label classification problem = Multiple correct answers = Exclusive output ( For example, the chest X Light check 、 In the hospital )• Building classifiers , When solving a problem that has more than one correct answer , use Sigmoid The function processes each raw output value separately .• Sigmoid The function is shown below ( Be careful e):
In this formula ,σ Express Sigmoid function ,σ(zj) It means that you will Sigmoid Function applied to a number Zj. “Zj” Represents a single raw output value , Such as -0.5. j Represents the output value of the current operation . If you have four raw output values , be j = 1,2,3 or 4. In the previous example , The original output value is [-0.5,1.2,-0.1,2.4], be Z1 = -0.5,Z2 = 1.2,Z3 = -0.1,Z4 = 2.4. therefore ,
Z2,Z3、Z4 The calculation process is the same as above . because Sigmoid The function is applied to each of the original output values , So the possible output scenarios include : All categories have very low probabilities ( Such as “ This chest X There is nothing wrong with light inspection ”), The probability of one category is high, but the probability of others is very low ( Such as “ chest X The light examination revealed only pneumonia ”), The probability of multiple or all categories is high ( Such as “ chest X Light examination revealed pneumonia and abscess ”). The following figure for Sigmoid Function curve :
Softmax = Multi category classification problem = There is only one correct answer = Mutually exclusive output ( For example, handwritten numbers , Iris )• Building classifiers , When solving a problem with only one correct answer , use Softmax The function processes the raw output values .• Softmax The denominator of the function synthesizes all the factors of the original output value , It means ,Softmax The different probabilities obtained by the function are related to each other .• Softmax The function is expressed as follows :
Except for the denominator , To synthesize all the factors , In the original output value e ^ thing Add up ,Softmax Function and Sigmoid There's not much difference in functions . In other words , use Softmax Function to calculate a single raw output value ( for example Z1) when , You can't just count Z1, In the denominator Z1,Z2,Z3 and Z4 It should also be calculated , As shown below :
Softmax The advantage of the function is that the sum of all the output probabilities is 1:
When distinguishing handwritten numbers , use Softmax Function handles the original output value , If you want to add an example, it is divided into “8” Probability , It's going to reduce the example to other numbers (0,1,2,3,4,5,6,7 and / or 9) Probability .Sigmoid and Softmax Other examples of
summary :
• If the model output is a non mutex class , And you can select multiple categories at the same time , Then Sigmoid Function to calculate the original output value of the network .
• If the model output is a mutex class , And only one category can be selected , Then Softmax Function to calculate the original output value of the network .
版权声明
本文为[Spiritual]所创,转载请带上原文链接,感谢
边栏推荐
- 新型存算一体芯片诞生,利好人工智能应用~
- Mac环境安装Composer
- Elasticsearch 学习一(基础入门).
- Build simple business monitoring Kanban based on Alibaba cloud log service
- uni-app实战仿微信app开发
- Your random IO hard disk
- What are the necessary laws and regulations to know when entering the Internet?
- Station B STM32 video learning
- 2035 we will build such a country
- Recurrence of Apache kylin Remote Code Execution Vulnerability (cve-2020-1956)
猜你喜欢
一分钟全面看懂forsage智能合约全球共享以太坊矩阵计划
CSP考试须知与各种小技巧
It's just right. It's the ideal state
Summary of template engine
Don't release resources in finally, unlock a new pose!
Apache Kylin远程代码执行漏洞复现(CVE-2020-1956)
我用 Python 找出了删除我微信的所有人并将他们自动化删除了
Travel notes of Suzhou
Is there no way out for older programmers?
我用 Python 找出了删除我微信的所有人并将他们自动化删除了
随机推荐
VIM configuration tutorial + source code
Interpretation of deepmind's latest paper: the causal reasoning algorithm in discrete probability tree is proposed for the first time
搭载固态硬盘的服务器究竟比机械硬盘快多少
Rabbitmq (1) - basic introduction
非常规聚合问题举例
我用 Python 找出了删除我微信的所有人并将他们自动化删除了
What is forsage Ethereum smart contract? What is the global decline of Ethereum
阿里云加速增长,进一步巩固领先优势
Jsliang job series - 07 - promise
苏州游记
构建者模式(Builder pattern)
write文件一个字节后何时发起写磁盘IO
uni-app实战仿微信app开发
我们做了一个医疗版MNIST数据集,发现常见AutoML算法没那么好用
I used Python to find out all the people who deleted my wechat and deleted them automatically
使用K3S创建本地开发集群
Is there no way out for older programmers?
关于update操作并发问题
. net large data concurrency solution
CSP考试须知与各种小技巧