当前位置:网站首页>Entropy - conditional entropy - joint entropy - mutual information - cross entropy
Entropy - conditional entropy - joint entropy - mutual information - cross entropy
2022-06-30 19:15:00 【Ancient road】
entropy - Conditional entropy - Joint entropy - Mutual information - Cross entropy
0. introduction
It belongs to the basic concept of information theory .
1. Information entropy Entropy (information theory)
How to understand information entropy , This video is really great !
The amount of information : x = l o g 2 N x = log_2N x=log2N, N N N Is the number of possible events . for example , The amount of information for 3, Then the number of original and other possible events is 2 3 = 8 2^3=8 23=8.



Semaphore is a special case of information entropy : Events are waiting to happen .
- Suppose a coin : The probability of positive appearance is 0.8, The probability of the negative side is 0.2
- Turn it into an equi probable event ( N = 1 / p N = 1/p N=1/p):
- positive –> Imagine as 1 / 0.8 = 1.25 1/0.8=1.25 1/0.8=1.25 The probability of one occurrence of an equally probable event
- The reverse –> Imagine as 1 / 0.2 = 5 1/0.2=5 1/0.2=5 The probability of one occurrence of an equally probable event
- Then the amount of information at this time is : The intuitive amount of information should be l o g 1.25 + l o g 5 log1.25 + log5 log1.25+log5, Because the probabilities of these two events are also different , Therefore, the real amount of information at this time is the amount of information after integrating probability : 0.8 ∗ l o g 1.25 + 0.2 ∗ l o g 5 = 0.8 ∗ l o g 1 0.8 + 0.2 ∗ l o g 1 0.2 0.8*log1.25 + 0.2*log5 = 0.8*log\frac{1}{0.8} + 0.2*log\frac{1}{0.2} 0.8∗log1.25+0.2∗log5=0.8∗log0.81+0.2∗log0.21.
- This leads to the famous information entropy formula : Σ p i l o g 1 p i = − Σ p i l o g p i \Sigma{p_ilog\frac{1}{p_i}} = - \Sigma p_ilogp_i Σpilogpi1=−Σpilogpi
- H ( X ) = − ∑ i = 1 n p ( x i ) log p ( x i ) H(X)=-\sum_{i=1}^{n} p\left(x_{i}\right) \log p\left(x_{i}\right) H(X)=−∑i=1np(xi)logp(xi)
This article The definition of :
Definition : entropy , Used to measure the uncertainty of information .
explain : The greater the entropy , The more information . The less uncertainty , The smaller the entropy , such as “ Tomorrow the sun will rise in the East ” The entropy of this sentence is 0, Because this sentence does not carry any information , It describes a certain thing .
The example is also very intuitive :
Example : Suppose there are random variables X, It is used to express the weather of tomorrow .X There are three possible states 1) a sunny day 2) rain 3) overcast The probability of occurrence of each state is P(i) = 1/3, So, according to the formula of entropy : H ( X ) = − ∑ i = 1 n p ( x i ) log p ( x i ) H(X)=-\sum_{i=1}^{n} p\left(x_{i}\right) \log p\left(x_{i}\right) H(X)=−∑i=1np(xi)logp(xi)
It can be calculated that :H(X) = - 1/3 * log(1/3) - 1/3 * log(1/3) + 1/3 * log(1/3) = log3 =0.47712
If the probability of these three states is (0.1, 0.1, 0.8):H(X) = -0.1 * log(0.1) *2 - 0.8 * log(0.8) = 0.277528
The previous distribution can be found X The level of uncertainty is very high ( Entropy is very high ), Every state is very likely . The latter distribution ,X The degree of uncertainty is low ( Low entropy ), There is a high probability that the third state will occur .
2. Conditional entropy Conditional entropy
Definition : Under one condition , The uncertainty of random variables .
- Two random variables X,Y The distribution of , Can form Joint entropy (Joint Entropy), use H(X, Y) Express . namely : H ( X , Y ) = − Σ p ( x , y ) l o g ( x , y ) H(X, Y) = -Σp(x, y) log(x, y) H(X,Y)=−Σp(x,y)log(x,y)
- H ( X ∣ Y ) = H ( X , Y ) − H ( Y ) H(X|Y) = H(X, Y) - H(Y) H(X∣Y)=H(X,Y)−H(Y), Express (X, Y) The entropy involved in occurrence , subtract Y The entropy that occurs alone : stay Y Under the premise of occurrence ,X There's a new entropy .
H ( X ∣ Y ) = H ( X , Y ) − H ( X ) = − ∑ x , y p ( x , y ) log p ( x , y ) + ∑ x p ( x ) log p ( x ) = − ∑ x , y p ( x , y ) log p ( x , y ) + ∑ x ( ∑ y p ( x , y ) ) log p ( x ) = − ∑ x , y p ( x , y ) log p ( x , y ) + ∑ x , y p ( x , y ) log p ( x ) = − ∑ x , y p ( x , y ) log p ( x , y ) p ( x ) = − ∑ x , y p ( x , y ) log p ( y ∣ x ) \begin{aligned} &H(X|Y) = H(X, Y)-H(X) \\ &=-\sum_{x, y} p(x, y) \log p(x, y)+\sum_{x} p(x) \log p(x) \\ &=-\sum_{x, y} p(x, y) \log p(x, y)+\sum_{x}\left(\sum_{y} p(x, y)\right) \log p(x) \\ &=-\sum_{x, y} p(x, y) \log p(x, y)+\sum_{x, y} p(x, y) \log p(x) \\ &=-\sum_{x, y} p(x, y) \log \frac{p(x, y)}{p(x)} \\ &=-\sum_{x, y} p(x, y) \log p(y \mid x) \end{aligned} H(X∣Y)=H(X,Y)−H(X)=−x,y∑p(x,y)logp(x,y)+x∑p(x)logp(x)=−x,y∑p(x,y)logp(x,y)+x∑(y∑p(x,y))logp(x)=−x,y∑p(x,y)logp(x,y)+x,y∑p(x,y)logp(x)=−x,y∑p(x,y)logp(x)p(x,y)=−x,y∑p(x,y)logp(y∣x)
3. Joint entropy Joint Entropy
Two discrete random variables X ,Y The joint entropy of ( In bits ) Defined as :
H ( X , Y ) = − ∑ x ∈ X ∑ y ∈ Y P ( x , y ) log 2 [ P ( x , y ) ] {\displaystyle \mathrm {H} (X,Y)=-\sum _{x\in {\mathcal {X}}}\sum _{y\in {\mathcal {Y}}}P(x,y )\log _{2}[P(x,y)]} H(X,Y)=−x∈X∑y∈Y∑P(x,y)log2[P(x,y)]
For more than two random variables X 1 , . . . , X n X_{1},...,X_{n} X1,...,Xn Expand :
H ( X 1 , . . . , X n ) = − ∑ x 1 ∈ X 1 . . . ∑ x n ∈ X n P ( x 1 , . . . , x n ) log 2 [ P ( x 1 , . . . , x n ) ] {\displaystyle \mathrm {H} (X_{1},...,X_{n})=-\sum _{x_{1}\in {\mathcal {X}}_{1}}... \sum _{x_{n}\in {\mathcal {X}}_{n}}P(x_{1},...,x_{n})\log _{2}[P(x_{1 },...,x_{n})]} H(X1,...,Xn)=−x1∈X1∑...xn∈Xn∑P(x1,...,xn)log2[P(x1,...,xn)]
4. Mutual information Mutual information
Definition : It refers to the degree of correlation between two random variables .
understand : Determine random variables X After the value of , Another random variable Y The degree to which uncertainty diminishes , Therefore, the minimum value of mutual information is 0, It means that given a random variable has nothing to do with determining another random variable , The maximum value is the entropy of random variable , Means that given a random variable , It can completely eliminate the uncertainty of another random variable . This concept is relative to conditional entropy .
I ( X ; Y ) ≡ H ( X ) − H ( X ∣ Y ) ≡ H ( Y ) − H ( Y ∣ X ) ≡ H ( X ) + H ( Y ) − H ( X , Y ) ≡ H ( X , Y ) − H ( X ∣ Y ) − H ( Y ∣ X ) {\displaystyle {\begin{aligned}\operatorname {I} (X;Y)&{}\equiv \mathrm {H} (X)-\mathrm {H} (X\mid Y)\\&{}\equiv \mathrm {H} (Y)-\mathrm {H} (Y\mid X)\\&{}\equiv \mathrm {H} (X)+\mathrm {H} (Y)-\mathrm {H} (X,Y)\\&{}\equiv \mathrm {H} (X,Y)-\mathrm {H} (X\mid Y)-\mathrm {H} (Y\mid X)\end{aligned}}} I(X;Y)≡H(X)−H(X∣Y)≡H(Y)−H(Y∣X)≡H(X)+H(Y)−H(X,Y)≡H(X,Y)−H(X∣Y)−H(Y∣X)
I ( X ; Y ) = ∑ x ∈ X , y ∈ Y p ( X , Y ) ( x , y ) log p ( X , Y ) ( x , y ) p X ( x ) p Y ( y ) = ∑ x ∈ X , y ∈ Y p ( X , Y ) ( x , y ) log p ( X , Y ) ( x , y ) p X ( x ) − ∑ x ∈ X , y ∈ Y p ( X , Y ) ( x , y ) log p Y ( y ) = ∑ x ∈ X , y ∈ Y p X ( x ) p Y ∣ X = x ( y ) log p Y ∣ X = x ( y ) − ∑ x ∈ X , y ∈ Y p ( X , Y ) ( x , y ) log p Y ( y ) = ∑ x ∈ X p X ( x ) ( ∑ y ∈ Y p Y ∣ X = x ( y ) log p Y ∣ X = x ( y ) ) − ∑ y ∈ Y ( ∑ x ∈ X p ( X , Y ) ( x , y ) ) log p Y ( y ) = − ∑ x ∈ X p X ( x ) H ( Y ∣ X = x ) − ∑ y ∈ Y p Y ( y ) log p Y ( y ) = − H ( Y ∣ X ) + H ( Y ) = H ( Y ) − H ( Y ∣ X ) . {\displaystyle {\begin{aligned}\operatorname {I} (X;Y)&{}=\sum _{x\in {\mathcal {X}},y\in {\mathcal {Y}}}p_{(X,Y)}(x,y)\log {\frac {p_{(X,Y)}(x,y)}{p_{X}(x)p_{Y}(y)}}\\&{}=\sum _{x\in {\mathcal {X}},y\in {\mathcal {Y}}}p_{(X,Y)}(x,y)\log {\frac {p_{(X,Y)}(x,y)}{p_{X}(x)}}-\sum _{x\in {\mathcal {X}},y\in {\mathcal {Y}}}p_{(X,Y)}(x,y)\log p_{Y}(y)\\&{}=\sum _{x\in {\mathcal {X}},y\in {\mathcal {Y}}}p_{X}(x)p_{Y\mid X=x}(y)\log p_{Y\mid X=x}(y)-\sum _{x\in {\mathcal {X}},y\in {\mathcal {Y}}}p_{(X,Y)}(x,y)\log p_{Y}(y)\\&{}=\sum _{x\in {\mathcal {X}}}p_{X}(x)\left(\sum _{y\in {\mathcal {Y}}}p_{Y\mid X=x}(y)\log p_{Y\mid X=x}(y)\right)-\sum _{y\in {\mathcal {Y}}}\left(\sum _{x\in {\mathcal {X}}}p_{(X,Y)}(x,y)\right)\log p_{Y}(y)\\&{}=-\sum _{x\in {\mathcal {X}}}p_{X}(x)\mathrm {H} (Y\mid X=x)-\sum _{y\in {\mathcal {Y}}}p_{Y}(y)\log p_{Y}(y)\\&{}=-\mathrm {H} (Y\mid X)+\mathrm {H} (Y)\\&{}=\mathrm {H} (Y)-\mathrm {H} (Y\mid X).\\\end{aligned}}} I(X;Y)=x∈X,y∈Y∑p(X,Y)(x,y)logpX(x)pY(y)p(X,Y)(x,y)=x∈X,y∈Y∑p(X,Y)(x,y)logpX(x)p(X,Y)(x,y)−x∈X,y∈Y∑p(X,Y)(x,y)logpY(y)=x∈X,y∈Y∑pX(x)pY∣X=x(y)logpY∣X=x(y)−x∈X,y∈Y∑p(X,Y)(x,y)logpY(y)=x∈X∑pX(x)⎝⎛y∈Y∑pY∣X=x(y)logpY∣X=x(y)⎠⎞−y∈Y∑(x∈X∑p(X,Y)(x,y))logpY(y)=−x∈X∑pX(x)H(Y∣X=x)−y∈Y∑pY(y)logpY(y)=−H(Y∣X)+H(Y)=H(Y)−H(Y∣X).
- Two random variables X , Y X,Y X,Y Mutual information of , Defined as X , Y X,Y X,Y The relative entropy of the product of joint distribution and independent distribution of .
- I ( X , Y ) = D ( P ( X , Y ) ∣ ∣ P ( X ) P ( Y ) ) I(X,Y)=D(P(X,Y) || P(X)P(Y)) I(X,Y)=D(P(X,Y)∣∣P(X)P(Y)) , I ( X , Y ) = ∑ x , y p ( x , y ) log p ( x , y ) p ( x ) p ( y ) I(X, Y)=\sum_{x, y} p(x, y) \log \frac{p(x, y)}{p(x) p(y)} I(X,Y)=∑x,yp(x,y)logp(x)p(y)p(x,y)
Mutual information and information gain are actually the same value . Information gain = entropy – Conditional entropy , g ( D , A ) = H ( D ) – H ( D ∣ A ) g(D,A)=H(D) – H(D|A) g(D,A)=H(D)–H(D∣A)

5. Relative entropy
Relative entropy , It's also called cross entropy , Cross entropy , Identifying information ,Kullback entropy ,Kullback-Leible Divergence, etc .
set up p ( x ) 、 q ( x ) p(x)、q(x) p(x)、q(x) yes X X X Two probability distributions of the values in , be p p p Yes q q q The relative entropy of is
D KL ( P ∥ Q ) = ∑ x ∈ X P ( x ) log ( P ( x ) Q ( x ) ) = − ∑ x ∈ X P ( x ) log ( Q ( x ) P ( x ) ) {\displaystyle D_{\text{KL}}(P\parallel Q)=\sum _{x\in {\mathcal {X}}}P(x)\log \left({\frac {P(x)}{Q(x)}}\right)=-\sum _{x\in {\mathcal {X}}}P(x)\log \left({\frac {Q(x)}{P(x)}}\right)} DKL(P∥Q)=x∈X∑P(x)log(Q(x)P(x))=−x∈X∑P(x)log(P(x)Q(x))
explain :
- Relative entropy can measure two random variables “ distance ”
- General , D ( p ∣ ∣ q ) ≠ D ( q ∣ ∣ p ) D(p||q) ≠D(q||p) D(p∣∣q)=D(q∣∣p)
边栏推荐
- Detailed single case mode
- Lenovo Yoga 27 2022, full upgrade of super configuration
- Kubernetes----Pod配置容器启动命令
- Personally test the size of flutter after packaging APK, quite satisfied
- 德国AgBB VoC有害物质测试
- Glacier teacher's book
- Go Redis连接池
- 「经验」爬虫在工作中的实战应用『实现篇』
- Redis beginner to master 01
- Opencv data type code table dtype
猜你喜欢

When selecting smart speakers, do you prefer "smart" or "sound quality"? This article gives you the answer

ForkJoinPool

Ditto设置全局仅粘贴文本快捷键

Dlib library for face key point detection (openCV Implementation)

【TiDB】TiCDC canal_ Practical application of JSON

ONEFLOW source code parsing: automatic inference of operator signature

一套十万级TPS的IM综合消息系统的架构实践与思考

Nodejs installation and introduction

【合集- 行业解决方案】如何搭建高性能的数据加速与数据编排平台

Personally test the size of flutter after packaging APK, quite satisfied
随机推荐
How to configure webrtc video stream format playback in the new version of easygbs?
华兴证券:混合云原生架构下的 Kitex 实践
slice
How does rust implement dependency injection?
Can go struct in go question bank · 15 be compared?
基于UDP协议设计的大文件传输软件
Is it safe to open an account for goucai? Is it reliable?
DTD modeling
Pytorch learning (III)
Development: how to install offline MySQL in Linux system?
Swin-Transformer(2021-08)
go之web框架 iris
Rust 如何实现依赖注入?
PO模式简介「建议收藏」
TCP packet sticking problem
领导:谁再用 Redis 过期监听实现关闭订单,立马滚蛋!
[community star selection] the 23rd issue of the July revision plan | bit by bit creation, converging into a tower! Huawei freebuses 4E and other cool gifts
Is it safe to open a mobile stock account? Is it reliable?
Word——Word在试图打开文件时遇到错误的一种解决办法
Neon optimization 2: arm optimization high frequency Instruction Summary