当前位置:网站首页>Function classification big PK! How to use sigmoid and softmax respectively?

Function classification big PK! How to use sigmoid and softmax respectively?

2020-11-08 16:17:00 Spiritual

 Insert picture description here

Design models to perform classification tasks ( As for the chest X Just check the disease or handwritten number to classify ) when , Sometimes you need to choose multiple answers at the same time ( If you choose pneumonia and abscess at the same time ), Sometimes you can only choose one answer ( Like numbers “8”). This article will discuss how to apply Sigmoid Function or Softmax Function handles the original output value of the classifier .

There are many kinds of neural network classifier classification algorithms , But the content of this paper is limited to neural network classifier . The classification problem can be solved by different neural networks , Such as feedforward neural network and convolution neural network . application Sigmoid Function or Softmax The final result of FNN classifier is a vector , namely “ The original output value ”, Such as [-0.5, 1.2, -0.1, 2.4], These four outputs correspond to the chest X Pneumonia after light examination 、 Heart hypertrophy 、 Tumors and abscesses . But what do these raw output values mean ? It may be easier to understand by converting the output value to a probability . Compared with the seemingly casual “2.4”, The possibility of diabetes is 91%, This statement is easier for patients to understand .Sigmoid Function or Softmax Function can map the original output value of classifier to probability . The following figure shows the original output of the feedforward neural network ( Blue ) adopt Sigmoid Functions are mapped to probabilities ( Red ) The process of :
 Insert picture description here
Then use Softmax Function repeats the above process : Insert picture description here
As shown in the figure ,Sigmoid Functions and Softmax Function gives different results . The reason lies in ,Sigmoid The function processes the raw output values separately , So the results are independent of each other , The sum of probabilities is not necessarily 1, Pictured 0.37 + 0.77 + 0.48 + 0.91 = 2.53. contrary ,Softmax The output values of functions are related to each other , The sum of the probabilities is always 1, Pictured 0.04 + 0.21 + 0.05 + 0.70 = 1.00. therefore , stay Softmax Function , To increase the probability of a class , The probability of other categories must be reduced accordingly .

Sigmoid Function application : With the chest X X-ray examination and admission for example, chest X Photo chip : A chest X Light film can show many diseases at the same time , So the chest X X-ray classifiers also need to display multiple symptoms at the same time . Here is a chest showing pneumonia and abscess X Photo chip , In the tab bar on the right, there are two “1”: Insert picture description here
be hospitalized : The goal is based on the patient's health record , Determine the possibility of the patient's admission in the future . therefore , The classification problem can be designed as : According to the diagnosis, the disease may lead to the patient's admission in the future ( If any ), Classify the patient's existing health records . There may be a variety of diseases leading to admission , So there may be more than one answer . Chart : The following two feedforward neural networks correspond to the above problems respectively . In the final calculation , from Sigmoid Function handles the original output value , Get the corresponding probability , Allow multiple possibilities to coexist —— Because of the chest X X-rays may reflect a variety of abnormal states , There may be more than one cause of admission . Insert picture description here
Softmax Function application : With handwritten numbers and Iris( Iris ) For example, handwritten numbers : Distinguish between handwritten numbers (MNIST Data sets :https://en.wikipedia.org/wiki/MNIST_database) when , The classifier should use Softmax function , What kind of numbers are . After all , Numbers 8 It's just numbers 8, It can't be numbers at the same time 7. Insert picture description here
Iris:Iris Data set in 1936 In introducing (https://en.wikipedia.org/wiki/Iris_flower_data_set), It includes 150 Data sets , Divided into iris 、 Variegated Iris 、 Iris Virginia 3 class , Each category has 50 Data sets , Each data contains calyx length 、 Calyx width 、 Petal length 、 Petal width 4 Attributes . following 9 An example is taken from Iris Data sets : Insert picture description here
There are no images in the dataset , But here's the mottled iris (https://en.wikipedia.org/wiki/Iris_flower_data_set#/media/File:Iris_versicolor_3.jpg), For you to enjoy : Insert picture description here
Iris Neural network classifier of data set , To adopt Softmax Function handles the original output value , Because a iris can only be a specific species —— There's no point in dividing it into several varieties .
About “e” We should understand that Sigmoid and Softmax function , We should introduce “e”. In this paper , Just need to know e It's about equal to 2.71828 The mathematical constant of . Here is about e Other information about :• e The decimal system means forever , The numbers appear completely random —— Be similar to pi.• e Often used in compound interest 、 In the study of gambling and some probability distributions .• Here is e A formula for : Insert picture description here
but e There is more than one formula for . There are many ways to calculate it . For example :https://www.intmath.com/exponential-logarithmic-functions/calculating-e.php• 2004 year , Google's IPO reached 2,718,281,828 dollar , namely “e Million dollars ”.• Wikipedia is the famous decimal number in human history e The evolution of (https://en.wikipedia.org/wiki/E_%28mathematical_constant%29#Bernoulli_trials), from 1690 One digit of the year begins , Last until 1978 Year of 116,000 Digit number : Insert picture description here
Sigmoid Functions and Softmax function Sigmoid = Multi label classification problem = Multiple correct answers = Exclusive output ( For example, the chest X Light check 、 In the hospital )• Building classifiers , When solving a problem that has more than one correct answer , use Sigmoid The function processes each raw output value separately .• Sigmoid The function is shown below ( Be careful e):
 Insert picture description here
In this formula ,σ Express Sigmoid function ,σ(zj) It means that you will Sigmoid Function applied to a number Zj. “Zj” Represents a single raw output value , Such as -0.5. j Represents the output value of the current operation . If you have four raw output values , be j = 1,2,3 or 4. In the previous example , The original output value is [-0.5,1.2,-0.1,2.4], be Z1 = -0.5,Z2 = 1.2,Z3 = -0.1,Z4 = 2.4. therefore ,
 Insert picture description here
Z2,Z3、Z4 The calculation process is the same as above . because Sigmoid The function is applied to each of the original output values , So the possible output scenarios include : All categories have very low probabilities ( Such as “ This chest X There is nothing wrong with light inspection ”), The probability of one category is high, but the probability of others is very low ( Such as “ chest X The light examination revealed only pneumonia ”), The probability of multiple or all categories is high ( Such as “ chest X Light examination revealed pneumonia and abscess ”). The following figure for Sigmoid Function curve :
 Insert picture description here

Softmax = Multi category classification problem = There is only one correct answer = Mutually exclusive output ( For example, handwritten numbers , Iris )• Building classifiers , When solving a problem with only one correct answer , use Softmax The function processes the raw output values .• Softmax The denominator of the function synthesizes all the factors of the original output value , It means ,Softmax The different probabilities obtained by the function are related to each other .• Softmax The function is expressed as follows :
 Insert picture description here
Except for the denominator , To synthesize all the factors , In the original output value e ^ thing Add up ,Softmax Function and Sigmoid There's not much difference in functions . In other words , use Softmax Function to calculate a single raw output value ( for example Z1) when , You can't just count Z1, In the denominator Z1,Z2,Z3 and Z4 It should also be calculated , As shown below :
 Insert picture description here
Softmax The advantage of the function is that the sum of all the output probabilities is 1: Insert picture description here
When distinguishing handwritten numbers , use Softmax Function handles the original output value , If you want to add an example, it is divided into “8” Probability , It's going to reduce the example to other numbers (0,1,2,3,4,5,6,7 and / or 9) Probability .Sigmoid and Softmax Other examples of  Insert picture description here
summary :
• If the model output is a non mutex class , And you can select multiple categories at the same time , Then Sigmoid Function to calculate the original output value of the network .
• If the model output is a mutex class , And only one category can be selected , Then Softmax Function to calculate the original output value of the network .

版权声明
本文为[Spiritual]所创,转载请带上原文链接,感谢