当前位置:网站首页>Entropy information entropy cross entropy

Entropy information entropy cross entropy

2022-07-06 23:28:00 TranSad

  In information theory , We often use entropy to express the degree of chaos and uncertainty of information . The greater the entropy , The more uncertain the information is .

The formula for entropy is as follows :

( notes :log Default to 2 Bottom )

  Taking this formula apart is actually very simple : A minus sign , One p(x) as well as log(p(x)). We know that the probability of an event is 0-1 Between , Such a probability value is sent into log function ( Here's the picture ), It must be less than 0 Of , So add a symbol outside , We can get the common positive entropy .

  When the probability value tends to 0 perhaps 1 when ( That is, the certainty is very strong ), be p(x) perhaps log(p(x)) Will tend 0, Entropy will be small ; When the probability value tends to 1/2 when ( That is, the uncertainty is very strong ), be p(x) perhaps log(p(x)) Do not tend to 0, Entropy will be large .

  for instance , Let's say there are 4 A ball .

1. Suppose it's all black balls ( Minimum uncertainty ), Its entropy is :-1*log1=0

2. hypothesis 2 black ball 2 White ball ( Start with uncertainty ), The entropy is :-0.5log0.5-0.5log0.5=0.301

3. hypothesis 1 black ball 1 White ball 1 yellow ball 1 Basketball ( High uncertainty ), The entropy is :-0.25log0.25--0.25log0.25-0.25log0.25-0.25log0.25 = 0.602

Evaluate the classification results  

  We know that entropy can be used to see the uncertainty of information , Then we can use entropy to evaluate the effect of classification tasks , For example, we have four pictures , Two pictures of cats : cat 1, cat 2, Two week dog chart : Dog 1, Dog 2. We feed it into the classifier , Obviously, this is a dichotomy problem , Now our two models get the following results respectively :

Model one :( cat 1, cat 2)/( Dog 1, Dog 2).

Model two :( cat 1, Dog 2)/( cat 2, Dog 1).

  obviously , Put the model 1 The calculation formula of input entropy of classification results , We can get 0 The entropy of , That is, the uncertainty is 0, Explain that each category is classified correctly . And model 2 The classification results of , You will find that entropy is very large , It shows that the classification effect of the model is not good .

  Of course , The above is just a simple example of using entropy to evaluate classification tasks , Only applicable to unsupervised tasks .

Cross entropy

  In more cases , Our classification task is labeled , For supervised learning , We use cross entropy to evaluate . The idea of cross entropy is different from the above , It starts from every sample , Calculate the distance between the predicted value and the expected value of each sample ( We call it cross entropy ), Formula for :

  among p For the expected output , A probability distribution q For actual output ,H(p,q) For cross entropy .

  For example, the cat and dog classification task , We assume that the cat is 10, Dog for 01, Then if the model classifies cats into dogs , Now p=10,q=01,H(p,q)=-(1*log(0)+0*log(1))=∞. It's not surprising to calculate infinity here , Because it's the opposite , So the calculated “ distance ” A very large . In more cases , We will have something similar q=(0.1,0.9) So the value of the , At this time, the calculated entropy is a non infinite but equally large value .

Information entropy solves the problem of weighing times  

  Information entropy is very useful , For example, we often encounter such a classic problem : Yes n A little ball , Only one ball weighs differently from the others ( Heavier than other balls ), Ask us how we weigh by balance , You can find this ball at least a few times ?

  If we didn't study information theory , The first method I came up with was dichotomy :“ Divide the balls into two parts , Keep the heavy part , Another dichotomy ……” And so on. . But by calculating information entropy , We can deviate from the actual weighing method , Directly from “ God's perspective ” To get the final answer —— This is the wonder of applying information entropy .

  How to solve it ? We know there are n A ball , The probability of each ball being heavier is 1/n, Then the total amount of information is :

  H(x) = n* (-1/n)*log(1/n) = logn

  It uses “ The amount of information ” To describe the result , Information quantity is another variable closely related to entropy in information theory —— Looking at the formula, it seems that the method of calculating the amount of information is also very easy to understand and closely related to the entropy formula .( The more direct calculation formula of information is I=log2(1/p))

  And weigh once every time , We can get three results : Left , Right side and the same weight . So the amount of information that can be eliminated is :

H(y)=3*(-1/3)*log(1/3)=log3

  therefore , The minimum number of weighing times required is H(x)/H(y)=logn/log3 Time .

  The above is just a very simple example , Sometimes we don't know whether the unusual ball is heavy or light , At this time, our uncertainty about the whole will increase , That is, the total amount of information H(x) It will change —— The specific idea of adding is : Finding a different ball requires logn The amount of information , It is necessary to judge whether the ball is heavy or light log2 The amount of information , So the total amount of information H(x) by log2+logn=log2n.

  At this time, the minimum number of weighing times required is H(x)/H(y)=log2n/log3 Time .

  This article mainly combs the information entropy 、 Concepts and usages such as cross entropy , Finally, it is simply extended to the use of information in information theory to solve the problem of balance weighing . Originally, I wanted to write another three door question ( It can also be seen from the idea of information entropy ), But I feel that I'm getting off the subject and pulling away …… That's it .

原网站

版权声明
本文为[TranSad]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202131035307857.html