当前位置:网站首页>Do you really understand entropy (including cross entropy)
Do you really understand entropy (including cross entropy)
2022-06-09 06:20:00 【GodGump】
Thank you for reading
1. entropy (Entropy)
1.1 There are many versions of entropy
Different people 、 Different fields have made different explanations for entropy : The level of confusion , uncertainty , Degree of surprise , Unpredictability , Amount of information, etc .
1.2 First put forward
The concept of entropy in information theory was put forward by Shannon for the first time , The aim is to find an efficient / A method of encoding information losslessly : Efficiency is measured by the average length of encoded data , The smaller the average length, the more efficient ; At the same time, it also needs to meet “ Nondestructive ” Conditions , That is, the original information cannot be lost after coding . such , Shannon put forward the definition of entropy : Minimum average coding length of lossless coding event information .
1.3 Calculate entropy directly
We know 1 One 2 Every two of the base numbers 01 A sequence can represent two numbers , For example, male. 0 Woman 1. That is, every N individual 01 A sequence can represent 2 Of N The number of the power . Suppose an information event has 8 All possible states , And various states and other possibilities , That is, the possibilities are 12.5%=1/8. How many bits do we need to encode 8 What's the value ?1 Bits can be encoded 2 It's worth (0 or 1),2 Bits can be encoded 2×2=4 It's worth (00,01,10,11), be 8 Values required 3 position ,2×2×2=8(000,001,010,011,100,101,110,111).
We can't reduce any 1 position , Because that would cause ambiguity , Similarly, we should not be more than 3 Bits to encode 8 Possible values . To sum up , Those who have N Information about the state of equal possibility , The possibility of each state P = 1/N, The minimum encoding length required to encode this information is :
1/N*log With 2 At the bottom of the 8 Accumulation
The formula of unequal probability is obtained from this kind of graph :
2. Cross entropy (Cross-Entropy)
2.1 The origin of cross entropy
Say the important thing again :“ Entropy is the theoretical minimum average coding length of events subject to a certain probability distribution ”, As long as we know the probability distribution of any event , We can calculate its entropy ; So if we don't know the probability distribution of events , And I want to calculate entropy , How do you do that ? Let's estimate the entropy , The process of entropy estimation naturally leads to cross entropy .
2.2 The estimation of entropy is affected
The expected probability distribution is calculated as Q, And the real probability distribution P Different .
The probability of calculating the minimum code length is -logQ, And the real minimum encoding length -logP Different .
That is, expectations , We use the real probability distribution P To calculate ; For encoding length , We use a hypothetical probability distribution Q To calculate , Because it is estimated to encode information . Because entropy is the theoretical average minimum coding length , So cross entropy can only be greater than or equal to entropy . let me put it another way , If our estimate is perfect , namely Q=P, So there are H(P,Q) = H§, otherwise ,H(P,Q) > H§.
2.3 Cross entropy loss function
People familiar with machine learning know that cross entropy is used as loss function in classification model , I must also be impressed by the binary cross entropy used by the cat classifier in Wu Enda's machine learning video . But the teacher usually takes it in one stroke , Do you really understand ?
Suppose an animal photo data set has 5 Animals , And there is only one animal in each picture , The label of each photo is one-hot code . The probability that the first picture is a dog is 100%, The probability of being another animal is 0; The probability that the second picture is a fox is 100%, The probability of being another animal is 0, The rest of the photos are the same ; So you can calculate , The entropy of each picture is 0. let me put it another way , With one-hot Every photo coded as a label has 100% Uncertainty of , Unlike other ways of describing probability : The probability of a dog is 90%, The probability of a cat is 10%.
The formula

边栏推荐
- “==“和 equals 方法究竟有什么区别?
- Abstract classes and interfaces
- C#字符串用法集合
- 懒惰计数器 Lazy Counter
- Coredns part 4-compiling and installing unbound
- Bat renames files (folders) in batch
- Wireshark illustrates TCP three handshakes and four waves
- Talk about bladed software
- Selection of industrial am335x core modules
- Mt2712 platform AGL6 demo adaptation
猜你喜欢
随机推荐
Quanzhi H3 was discontinued, and a40i/t3 was even better -- com-x40i core module came
Solution d'instructeur de robot basée sur l'enregistrement complet a40i fabriqué en Chine
Parallels Desktop 安装 Windows10 提示“安全启动功能防止操作系统启动”解决方法
Shopify theme style development
Coredns part 2- compiling and installing external plugins
Singh function sinc (x) and sampling function SA (T)
channel is not opened
unity iTween使用
深度学习之二手手机价格预测
Dbeaver export query data SQL file
ImportError: cannot import name ‘joblib‘ from ‘sklearn.externals‘
ping: XXX: 未知的名称或服务原因分析
Competition between am335x and Quanzhi a40i
Introduction to bladed fault simulation method
C event
Transplant qt5.12 to t507 development board
全志V3s学习记录(11)音频、视频使用总结
C iterator
Too many open files
Example of flow chart, sequence diagram and Gantt chart of typera









