当前位置:网站首页>Function classification big PK! How to use sigmoid and softmax respectively?
Function classification big PK! How to use sigmoid and softmax respectively?
2020-11-08 16:17:00 【Spiritual】

Design models to perform classification tasks ( As for the chest X Just check the disease or handwritten number to classify ) when , Sometimes you need to choose multiple answers at the same time ( If you choose pneumonia and abscess at the same time ), Sometimes you can only choose one answer ( Like numbers “8”). This article will discuss how to apply Sigmoid Function or Softmax Function handles the original output value of the classifier .
There are many kinds of neural network classifier classification algorithms , But the content of this paper is limited to neural network classifier . The classification problem can be solved by different neural networks , Such as feedforward neural network and convolution neural network . application Sigmoid Function or Softmax The final result of FNN classifier is a vector , namely “ The original output value ”, Such as [-0.5, 1.2, -0.1, 2.4], These four outputs correspond to the chest X Pneumonia after light examination 、 Heart hypertrophy 、 Tumors and abscesses . But what do these raw output values mean ? It may be easier to understand by converting the output value to a probability . Compared with the seemingly casual “2.4”, The possibility of diabetes is 91%, This statement is easier for patients to understand .Sigmoid Function or Softmax Function can map the original output value of classifier to probability . The following figure shows the original output of the feedforward neural network ( Blue ) adopt Sigmoid Functions are mapped to probabilities ( Red ) The process of :

Then use Softmax Function repeats the above process :
As shown in the figure ,Sigmoid Functions and Softmax Function gives different results . The reason lies in ,Sigmoid The function processes the raw output values separately , So the results are independent of each other , The sum of probabilities is not necessarily 1, Pictured 0.37 + 0.77 + 0.48 + 0.91 = 2.53. contrary ,Softmax The output values of functions are related to each other , The sum of the probabilities is always 1, Pictured 0.04 + 0.21 + 0.05 + 0.70 = 1.00. therefore , stay Softmax Function , To increase the probability of a class , The probability of other categories must be reduced accordingly .
Sigmoid Function application : With the chest X X-ray examination and admission for example, chest X Photo chip : A chest X Light film can show many diseases at the same time , So the chest X X-ray classifiers also need to display multiple symptoms at the same time . Here is a chest showing pneumonia and abscess X Photo chip , In the tab bar on the right, there are two “1”:
be hospitalized : The goal is based on the patient's health record , Determine the possibility of the patient's admission in the future . therefore , The classification problem can be designed as : According to the diagnosis, the disease may lead to the patient's admission in the future ( If any ), Classify the patient's existing health records . There may be a variety of diseases leading to admission , So there may be more than one answer . Chart : The following two feedforward neural networks correspond to the above problems respectively . In the final calculation , from Sigmoid Function handles the original output value , Get the corresponding probability , Allow multiple possibilities to coexist —— Because of the chest X X-rays may reflect a variety of abnormal states , There may be more than one cause of admission .
Softmax Function application : With handwritten numbers and Iris( Iris ) For example, handwritten numbers : Distinguish between handwritten numbers (MNIST Data sets :https://en.wikipedia.org/wiki/MNIST_database) when , The classifier should use Softmax function , What kind of numbers are . After all , Numbers 8 It's just numbers 8, It can't be numbers at the same time 7.
Iris:Iris Data set in 1936 In introducing (https://en.wikipedia.org/wiki/Iris_flower_data_set), It includes 150 Data sets , Divided into iris 、 Variegated Iris 、 Iris Virginia 3 class , Each category has 50 Data sets , Each data contains calyx length 、 Calyx width 、 Petal length 、 Petal width 4 Attributes . following 9 An example is taken from Iris Data sets :
There are no images in the dataset , But here's the mottled iris (https://en.wikipedia.org/wiki/Iris_flower_data_set#/media/File:Iris_versicolor_3.jpg), For you to enjoy :
Iris Neural network classifier of data set , To adopt Softmax Function handles the original output value , Because a iris can only be a specific species —— There's no point in dividing it into several varieties .
About “e” We should understand that Sigmoid and Softmax function , We should introduce “e”. In this paper , Just need to know e It's about equal to 2.71828 The mathematical constant of . Here is about e Other information about :• e The decimal system means forever , The numbers appear completely random —— Be similar to pi.• e Often used in compound interest 、 In the study of gambling and some probability distributions .• Here is e A formula for :
but e There is more than one formula for . There are many ways to calculate it . For example :https://www.intmath.com/exponential-logarithmic-functions/calculating-e.php• 2004 year , Google's IPO reached 2,718,281,828 dollar , namely “e Million dollars ”.• Wikipedia is the famous decimal number in human history e The evolution of (https://en.wikipedia.org/wiki/E_%28mathematical_constant%29#Bernoulli_trials), from 1690 One digit of the year begins , Last until 1978 Year of 116,000 Digit number :
Sigmoid Functions and Softmax function Sigmoid = Multi label classification problem = Multiple correct answers = Exclusive output ( For example, the chest X Light check 、 In the hospital )• Building classifiers , When solving a problem that has more than one correct answer , use Sigmoid The function processes each raw output value separately .• Sigmoid The function is shown below ( Be careful e):

In this formula ,σ Express Sigmoid function ,σ(zj) It means that you will Sigmoid Function applied to a number Zj. “Zj” Represents a single raw output value , Such as -0.5. j Represents the output value of the current operation . If you have four raw output values , be j = 1,2,3 or 4. In the previous example , The original output value is [-0.5,1.2,-0.1,2.4], be Z1 = -0.5,Z2 = 1.2,Z3 = -0.1,Z4 = 2.4. therefore ,

Z2,Z3、Z4 The calculation process is the same as above . because Sigmoid The function is applied to each of the original output values , So the possible output scenarios include : All categories have very low probabilities ( Such as “ This chest X There is nothing wrong with light inspection ”), The probability of one category is high, but the probability of others is very low ( Such as “ chest X The light examination revealed only pneumonia ”), The probability of multiple or all categories is high ( Such as “ chest X Light examination revealed pneumonia and abscess ”). The following figure for Sigmoid Function curve :

Softmax = Multi category classification problem = There is only one correct answer = Mutually exclusive output ( For example, handwritten numbers , Iris )• Building classifiers , When solving a problem with only one correct answer , use Softmax The function processes the raw output values .• Softmax The denominator of the function synthesizes all the factors of the original output value , It means ,Softmax The different probabilities obtained by the function are related to each other .• Softmax The function is expressed as follows :

Except for the denominator , To synthesize all the factors , In the original output value e ^ thing Add up ,Softmax Function and Sigmoid There's not much difference in functions . In other words , use Softmax Function to calculate a single raw output value ( for example Z1) when , You can't just count Z1, In the denominator Z1,Z2,Z3 and Z4 It should also be calculated , As shown below :

Softmax The advantage of the function is that the sum of all the output probabilities is 1:
When distinguishing handwritten numbers , use Softmax Function handles the original output value , If you want to add an example, it is divided into “8” Probability , It's going to reduce the example to other numbers (0,1,2,3,4,5,6,7 and / or 9) Probability .Sigmoid and Softmax Other examples of 
summary :
• If the model output is a non mutex class , And you can select multiple categories at the same time , Then Sigmoid Function to calculate the original output value of the network .
• If the model output is a mutex class , And only one category can be selected , Then Softmax Function to calculate the original output value of the network .
版权声明
本文为[Spiritual]所创,转载请带上原文链接,感谢
边栏推荐
- Summary of rendering of water wave and caustics (etching) in webgl
- Improvement of rate limit for laravel8 update
- Talk about go code coverage technology and best practices
- 我用 Python 找出了删除我微信的所有人并将他们自动化删除了
- When to write disk IO after one byte of write file
- Google's AI model, which can translate 101 languages, is only one more than Facebook
- AI周报:允许“员工自愿降薪”;公司回应:员工内心高兴满意;虎牙HR将员工抬出公司;瑞典禁用华为中兴5G设备
- Golang 系统ping程序探测存活主机(任意权限)
- API生命周期的5个阶段
- .NET 大数据量并发解决方案
猜你喜欢

CSP考试须知与各种小技巧

Rabbitmq (1) - basic introduction

Millet and oppo continue to soar in the European market, and Xiaomi is even closer to apple

wanxin finance

We made a medical version of the MNIST dataset, and found that the common automl algorithm is not so easy to use

LeanCloud 十月变化
![[开源] .Net 使用 ORM 访问 华为GaussDB数据库](/img/f8/50715c25a9d49b010cba2ff442c04e.jpg)
[开源] .Net 使用 ORM 访问 华为GaussDB数据库

Alibaba cloud accelerates its growth and further consolidates its leading edge

C++的那些事儿:从电饭煲到火箭,C++无处不在

学习记录并且简单分析
随机推荐
markdown使用
How to cooperate with people in software development? |Daily anecdotes
性能压测时,并发压力增加,系统响应时间和吞吐量如何变化
进入互联网得知道的必备法律法规有哪些?
python开发qt程序读取图片的简单流程
I used Python to find out all the people who deleted my wechat and deleted them automatically
Welcome to offer, grade P7, face-to-face sharing, 10000 words long text to take you through the interview process
Using k3s to create local development cluster
刚刚好,才是最理想的状态
Tencent: Although Ali's Taichung is good, it is not omnipotent!
. net large data concurrency solution
佛萨奇forsage以太坊智能合约是什么?以太坊全球滑落是怎么回事
Learn to record and analyze
我用 Python 找出了删除我微信的所有人并将他们自动化删除了
Examples of unconventional aggregation
Js中常见的内存泄漏场景
Build simple business monitoring Kanban based on Alibaba cloud log service
Awk implements SQL like join operation
I used Python to find out all the people who deleted my wechat and deleted them automatically
关于adb连接手机offline的问题解决