当前位置:网站首页>Entropy - conditional entropy - joint entropy - mutual information - cross entropy
Entropy - conditional entropy - joint entropy - mutual information - cross entropy
2022-06-30 19:15:00 【Ancient road】
entropy - Conditional entropy - Joint entropy - Mutual information - Cross entropy
0. introduction
It belongs to the basic concept of information theory .
1. Information entropy Entropy (information theory)
How to understand information entropy , This video is really great !
The amount of information : x = l o g 2 N x = log_2N x=log2N, N N N Is the number of possible events . for example , The amount of information for 3, Then the number of original and other possible events is 2 3 = 8 2^3=8 23=8.
Semaphore is a special case of information entropy : Events are waiting to happen .
- Suppose a coin : The probability of positive appearance is 0.8, The probability of the negative side is 0.2
- Turn it into an equi probable event ( N = 1 / p N = 1/p N=1/p):
- positive –> Imagine as 1 / 0.8 = 1.25 1/0.8=1.25 1/0.8=1.25 The probability of one occurrence of an equally probable event
- The reverse –> Imagine as 1 / 0.2 = 5 1/0.2=5 1/0.2=5 The probability of one occurrence of an equally probable event
- Then the amount of information at this time is : The intuitive amount of information should be l o g 1.25 + l o g 5 log1.25 + log5 log1.25+log5, Because the probabilities of these two events are also different , Therefore, the real amount of information at this time is the amount of information after integrating probability : 0.8 ∗ l o g 1.25 + 0.2 ∗ l o g 5 = 0.8 ∗ l o g 1 0.8 + 0.2 ∗ l o g 1 0.2 0.8*log1.25 + 0.2*log5 = 0.8*log\frac{1}{0.8} + 0.2*log\frac{1}{0.2} 0.8∗log1.25+0.2∗log5=0.8∗log0.81+0.2∗log0.21.
- This leads to the famous information entropy formula : Σ p i l o g 1 p i = − Σ p i l o g p i \Sigma{p_ilog\frac{1}{p_i}} = - \Sigma p_ilogp_i Σpilogpi1=−Σpilogpi
- H ( X ) = − ∑ i = 1 n p ( x i ) log p ( x i ) H(X)=-\sum_{i=1}^{n} p\left(x_{i}\right) \log p\left(x_{i}\right) H(X)=−∑i=1np(xi)logp(xi)
This article The definition of :
Definition : entropy , Used to measure the uncertainty of information .
explain : The greater the entropy , The more information . The less uncertainty , The smaller the entropy , such as “ Tomorrow the sun will rise in the East ” The entropy of this sentence is 0, Because this sentence does not carry any information , It describes a certain thing .
The example is also very intuitive :
Example : Suppose there are random variables X, It is used to express the weather of tomorrow .X There are three possible states 1) a sunny day 2) rain 3) overcast The probability of occurrence of each state is P(i) = 1/3, So, according to the formula of entropy : H ( X ) = − ∑ i = 1 n p ( x i ) log p ( x i ) H(X)=-\sum_{i=1}^{n} p\left(x_{i}\right) \log p\left(x_{i}\right) H(X)=−∑i=1np(xi)logp(xi)
It can be calculated that :H(X) = - 1/3 * log(1/3) - 1/3 * log(1/3) + 1/3 * log(1/3) = log3 =0.47712
If the probability of these three states is (0.1, 0.1, 0.8):H(X) = -0.1 * log(0.1) *2 - 0.8 * log(0.8) = 0.277528
The previous distribution can be found X The level of uncertainty is very high ( Entropy is very high ), Every state is very likely . The latter distribution ,X The degree of uncertainty is low ( Low entropy ), There is a high probability that the third state will occur .
2. Conditional entropy Conditional entropy
Definition : Under one condition , The uncertainty of random variables .
- Two random variables X,Y The distribution of , Can form Joint entropy (Joint Entropy), use H(X, Y) Express . namely : H ( X , Y ) = − Σ p ( x , y ) l o g ( x , y ) H(X, Y) = -Σp(x, y) log(x, y) H(X,Y)=−Σp(x,y)log(x,y)
- H ( X ∣ Y ) = H ( X , Y ) − H ( Y ) H(X|Y) = H(X, Y) - H(Y) H(X∣Y)=H(X,Y)−H(Y), Express (X, Y) The entropy involved in occurrence , subtract Y The entropy that occurs alone : stay Y Under the premise of occurrence ,X There's a new entropy .
H ( X ∣ Y ) = H ( X , Y ) − H ( X ) = − ∑ x , y p ( x , y ) log p ( x , y ) + ∑ x p ( x ) log p ( x ) = − ∑ x , y p ( x , y ) log p ( x , y ) + ∑ x ( ∑ y p ( x , y ) ) log p ( x ) = − ∑ x , y p ( x , y ) log p ( x , y ) + ∑ x , y p ( x , y ) log p ( x ) = − ∑ x , y p ( x , y ) log p ( x , y ) p ( x ) = − ∑ x , y p ( x , y ) log p ( y ∣ x ) \begin{aligned} &H(X|Y) = H(X, Y)-H(X) \\ &=-\sum_{x, y} p(x, y) \log p(x, y)+\sum_{x} p(x) \log p(x) \\ &=-\sum_{x, y} p(x, y) \log p(x, y)+\sum_{x}\left(\sum_{y} p(x, y)\right) \log p(x) \\ &=-\sum_{x, y} p(x, y) \log p(x, y)+\sum_{x, y} p(x, y) \log p(x) \\ &=-\sum_{x, y} p(x, y) \log \frac{p(x, y)}{p(x)} \\ &=-\sum_{x, y} p(x, y) \log p(y \mid x) \end{aligned} H(X∣Y)=H(X,Y)−H(X)=−x,y∑p(x,y)logp(x,y)+x∑p(x)logp(x)=−x,y∑p(x,y)logp(x,y)+x∑(y∑p(x,y))logp(x)=−x,y∑p(x,y)logp(x,y)+x,y∑p(x,y)logp(x)=−x,y∑p(x,y)logp(x)p(x,y)=−x,y∑p(x,y)logp(y∣x)
3. Joint entropy Joint Entropy
Two discrete random variables X ,Y The joint entropy of ( In bits ) Defined as :
H ( X , Y ) = − ∑ x ∈ X ∑ y ∈ Y P ( x , y ) log 2 [ P ( x , y ) ] {\displaystyle \mathrm {H} (X,Y)=-\sum _{x\in {\mathcal {X}}}\sum _{y\in {\mathcal {Y}}}P(x,y )\log _{2}[P(x,y)]} H(X,Y)=−x∈X∑y∈Y∑P(x,y)log2[P(x,y)]
For more than two random variables X 1 , . . . , X n X_{1},...,X_{n} X1,...,Xn Expand :
H ( X 1 , . . . , X n ) = − ∑ x 1 ∈ X 1 . . . ∑ x n ∈ X n P ( x 1 , . . . , x n ) log 2 [ P ( x 1 , . . . , x n ) ] {\displaystyle \mathrm {H} (X_{1},...,X_{n})=-\sum _{x_{1}\in {\mathcal {X}}_{1}}... \sum _{x_{n}\in {\mathcal {X}}_{n}}P(x_{1},...,x_{n})\log _{2}[P(x_{1 },...,x_{n})]} H(X1,...,Xn)=−x1∈X1∑...xn∈Xn∑P(x1,...,xn)log2[P(x1,...,xn)]
4. Mutual information Mutual information
Definition : It refers to the degree of correlation between two random variables .
understand : Determine random variables X After the value of , Another random variable Y The degree to which uncertainty diminishes , Therefore, the minimum value of mutual information is 0, It means that given a random variable has nothing to do with determining another random variable , The maximum value is the entropy of random variable , Means that given a random variable , It can completely eliminate the uncertainty of another random variable . This concept is relative to conditional entropy .
I ( X ; Y ) ≡ H ( X ) − H ( X ∣ Y ) ≡ H ( Y ) − H ( Y ∣ X ) ≡ H ( X ) + H ( Y ) − H ( X , Y ) ≡ H ( X , Y ) − H ( X ∣ Y ) − H ( Y ∣ X ) {\displaystyle {\begin{aligned}\operatorname {I} (X;Y)&{}\equiv \mathrm {H} (X)-\mathrm {H} (X\mid Y)\\&{}\equiv \mathrm {H} (Y)-\mathrm {H} (Y\mid X)\\&{}\equiv \mathrm {H} (X)+\mathrm {H} (Y)-\mathrm {H} (X,Y)\\&{}\equiv \mathrm {H} (X,Y)-\mathrm {H} (X\mid Y)-\mathrm {H} (Y\mid X)\end{aligned}}} I(X;Y)≡H(X)−H(X∣Y)≡H(Y)−H(Y∣X)≡H(X)+H(Y)−H(X,Y)≡H(X,Y)−H(X∣Y)−H(Y∣X)
I ( X ; Y ) = ∑ x ∈ X , y ∈ Y p ( X , Y ) ( x , y ) log p ( X , Y ) ( x , y ) p X ( x ) p Y ( y ) = ∑ x ∈ X , y ∈ Y p ( X , Y ) ( x , y ) log p ( X , Y ) ( x , y ) p X ( x ) − ∑ x ∈ X , y ∈ Y p ( X , Y ) ( x , y ) log p Y ( y ) = ∑ x ∈ X , y ∈ Y p X ( x ) p Y ∣ X = x ( y ) log p Y ∣ X = x ( y ) − ∑ x ∈ X , y ∈ Y p ( X , Y ) ( x , y ) log p Y ( y ) = ∑ x ∈ X p X ( x ) ( ∑ y ∈ Y p Y ∣ X = x ( y ) log p Y ∣ X = x ( y ) ) − ∑ y ∈ Y ( ∑ x ∈ X p ( X , Y ) ( x , y ) ) log p Y ( y ) = − ∑ x ∈ X p X ( x ) H ( Y ∣ X = x ) − ∑ y ∈ Y p Y ( y ) log p Y ( y ) = − H ( Y ∣ X ) + H ( Y ) = H ( Y ) − H ( Y ∣ X ) . {\displaystyle {\begin{aligned}\operatorname {I} (X;Y)&{}=\sum _{x\in {\mathcal {X}},y\in {\mathcal {Y}}}p_{(X,Y)}(x,y)\log {\frac {p_{(X,Y)}(x,y)}{p_{X}(x)p_{Y}(y)}}\\&{}=\sum _{x\in {\mathcal {X}},y\in {\mathcal {Y}}}p_{(X,Y)}(x,y)\log {\frac {p_{(X,Y)}(x,y)}{p_{X}(x)}}-\sum _{x\in {\mathcal {X}},y\in {\mathcal {Y}}}p_{(X,Y)}(x,y)\log p_{Y}(y)\\&{}=\sum _{x\in {\mathcal {X}},y\in {\mathcal {Y}}}p_{X}(x)p_{Y\mid X=x}(y)\log p_{Y\mid X=x}(y)-\sum _{x\in {\mathcal {X}},y\in {\mathcal {Y}}}p_{(X,Y)}(x,y)\log p_{Y}(y)\\&{}=\sum _{x\in {\mathcal {X}}}p_{X}(x)\left(\sum _{y\in {\mathcal {Y}}}p_{Y\mid X=x}(y)\log p_{Y\mid X=x}(y)\right)-\sum _{y\in {\mathcal {Y}}}\left(\sum _{x\in {\mathcal {X}}}p_{(X,Y)}(x,y)\right)\log p_{Y}(y)\\&{}=-\sum _{x\in {\mathcal {X}}}p_{X}(x)\mathrm {H} (Y\mid X=x)-\sum _{y\in {\mathcal {Y}}}p_{Y}(y)\log p_{Y}(y)\\&{}=-\mathrm {H} (Y\mid X)+\mathrm {H} (Y)\\&{}=\mathrm {H} (Y)-\mathrm {H} (Y\mid X).\\\end{aligned}}} I(X;Y)=x∈X,y∈Y∑p(X,Y)(x,y)logpX(x)pY(y)p(X,Y)(x,y)=x∈X,y∈Y∑p(X,Y)(x,y)logpX(x)p(X,Y)(x,y)−x∈X,y∈Y∑p(X,Y)(x,y)logpY(y)=x∈X,y∈Y∑pX(x)pY∣X=x(y)logpY∣X=x(y)−x∈X,y∈Y∑p(X,Y)(x,y)logpY(y)=x∈X∑pX(x)⎝⎛y∈Y∑pY∣X=x(y)logpY∣X=x(y)⎠⎞−y∈Y∑(x∈X∑p(X,Y)(x,y))logpY(y)=−x∈X∑pX(x)H(Y∣X=x)−y∈Y∑pY(y)logpY(y)=−H(Y∣X)+H(Y)=H(Y)−H(Y∣X).
- Two random variables X , Y X,Y X,Y Mutual information of , Defined as X , Y X,Y X,Y The relative entropy of the product of joint distribution and independent distribution of .
- I ( X , Y ) = D ( P ( X , Y ) ∣ ∣ P ( X ) P ( Y ) ) I(X,Y)=D(P(X,Y) || P(X)P(Y)) I(X,Y)=D(P(X,Y)∣∣P(X)P(Y)) , I ( X , Y ) = ∑ x , y p ( x , y ) log p ( x , y ) p ( x ) p ( y ) I(X, Y)=\sum_{x, y} p(x, y) \log \frac{p(x, y)}{p(x) p(y)} I(X,Y)=∑x,yp(x,y)logp(x)p(y)p(x,y)
Mutual information and information gain are actually the same value . Information gain = entropy – Conditional entropy , g ( D , A ) = H ( D ) – H ( D ∣ A ) g(D,A)=H(D) – H(D|A) g(D,A)=H(D)–H(D∣A)
5. Relative entropy
Relative entropy , It's also called cross entropy , Cross entropy , Identifying information ,Kullback entropy ,Kullback-Leible Divergence, etc .
set up p ( x ) 、 q ( x ) p(x)、q(x) p(x)、q(x) yes X X X Two probability distributions of the values in , be p p p Yes q q q The relative entropy of is
D KL ( P ∥ Q ) = ∑ x ∈ X P ( x ) log ( P ( x ) Q ( x ) ) = − ∑ x ∈ X P ( x ) log ( Q ( x ) P ( x ) ) {\displaystyle D_{\text{KL}}(P\parallel Q)=\sum _{x\in {\mathcal {X}}}P(x)\log \left({\frac {P(x)}{Q(x)}}\right)=-\sum _{x\in {\mathcal {X}}}P(x)\log \left({\frac {Q(x)}{P(x)}}\right)} DKL(P∥Q)=x∈X∑P(x)log(Q(x)P(x))=−x∈X∑P(x)log(P(x)Q(x))
explain :
- Relative entropy can measure two random variables “ distance ”
- General , D ( p ∣ ∣ q ) ≠ D ( q ∣ ∣ p ) D(p||q) ≠D(q||p) D(p∣∣q)=D(q∣∣p)
边栏推荐
猜你喜欢
Swin-Transformer(2021-08)
When selecting smart speakers, do you prefer "smart" or "sound quality"? This article gives you the answer
The cloud native landing practice of using rainbow for Tuowei information
Personally test the size of flutter after packaging APK, quite satisfied
3.10 haas506 2.0开发教程-example-TFT
Dlib库实现人脸关键点检测(Opencv实现)
删除排序链表中的重复元素 II[链表节点统一操作--dummyHead]
Small program container technology to promote the operation efficiency of the park
PyTorch学习(三)
TCP粘包问题
随机推荐
Memory Limit Exceeded
Rust 如何实现依赖注入?
Small program container technology to promote the operation efficiency of the park
Pytorch learning (III)
Personally test the size of flutter after packaging APK, quite satisfied
Ambient light and micro distance detection system based on stm32f1
BeanUtils.copyProperties() 对比 mapstruct
Can go struct in go question bank · 15 be compared?
DTD modeling
VS 常用的快捷键指令
CTF流量分析常见题型(二)-USB流量
屏幕显示技术进化史
torch. roll
Development: how to install offline MySQL in Linux system?
PyTorch学习(三)
slice
新版EasyGBS如何配置WebRTC视频流格式播放?
CODING 正式入驻腾讯会议应用市场!
TCP粘包问题
CTF flow analysis common questions (II) -usb flow