当前位置:网站首页>Basis of information entropy
Basis of information entropy
2022-07-03 01:10:00 【The code family】
Information entropy
The concept of information entropy is defined by Claude · Shannon On 1948 in . Shannon is a famous American mathematician 、 Founder of information theory , He put forward “ Information entropy ” The concept of , It laid the foundation for information theory and digital communication .
Understand information entropy
Information entropy is a measure of uncertainty , That is, the probability of discrete random time , Simply speaking “ The more chaotic the situation becomes , The greater the entropy of information , On the contrary, the smaller .”
The great mathematician Shannon gave the formula of information entropy , As shown below :

among , p Means probability , here “X” Represents the set for information entropy calculation . In the decision tree classification algorithm , We can calculate the proportion of each category ( The higher the proportion , The higher the purity of this category ) To understand the , among N Indicates the number of categories , and Pk Presentation category K Proportion in subset . Understand the above meaning , It is very simple to understand the calculation process of information entropy , Divided into three times and four operations , That is, multiplication 、 Sum and negate .
Information entropy formula calculation
Let's take a simple example , A simple application of the above information entropy calculation formula , stay Binary classification problem in , If all the current samples belong to k Category , Then the proportion of this category in the subset node reaches 100%( The other category accounts for 0), namely pk = 1, At this time, the calculation formula of information entropy is as follows :

The algorithm of logarithmic function will not be repeated here , With 2 Bottom 1 The logarithm of is 0, Therefore, the sum result of the information entropy of the two categories is 0. The entropy of information is 0 Explain that the categories within the subset are consistent “ In order ”. From this we can also know that pk=0.5 When the information entropy reaches the maximum . Based on the above information , We draw Information entropy Function image of , As shown below :

ID3 Algorithm — Information gain
Through the learning of the previous knowledge , We know that the decision tree algorithm includes all A collection of categories Is the calculation object , And through conditional discrimination , Select the category with high purity , So how do we use information entropy to select decision conditions from feature sets ? Now let's ID3 The algorithm is illustrated by an example .
ID3(Iterative Dichotomiser 3, Iterative binary tree 3 generation ) Algorithm is one of the decision tree algorithms , It is based on Occam's razor principle Realized , The core idea of this principle is “ The greatest truths are the simplest , Do more with as few things as possible ”.
Apply the above principles to the decision tree , And then there is ID3 The core idea of the algorithm : A smaller decision tree is better than a larger one , That is, use as few criteria as possible .ID3 The algorithm uses information gain to realize the selection of discrimination conditions , From Shannon's “ Information theory ” We can learn from ,ID3 The algorithm selects the feature dimension with the largest information gain to if -else Distinguish .
1) Understand information gain
In short , Information gain is specific to a specific feature , The presence or absence of a feature is important to the whole system 、 The influence degree of the set can be used as “ Information gain ” To describe . We know , After a if-else After discrimination , The original category set is split into two sets , And our goal is to make a certain category of one set “ The purity ” As high as possible , If the purity of the split subset is higher than that of the original set , That means this is a if-else Division is effective . Made by comparison “ The purity ” The highest division condition , That's what we're looking for “ Most suitable ” The distinguishing condition of the characteristic dimension of .
2) Information gain formula
So how to calculate the information gain value , Here we can use information entropy to calculate . We judge by comparing the information entropy of the set before and after the partition , That is, subtracting , Subtract the information entropy divided according to the attribute of feature dimension from the information entropy of the set before division , Information gain value can be obtained . The formula is as follows :

G(S,a) Represents a collection S Select the feature attribute t To define the information gain of molecular set .H(x) Represents the information entropy of a set . Aforementioned “ Subtract ” It looks a little complicated , Let's focus on the meaning of subtraction :
- Capital K Express : By feature dimension t After division , Produced the meaning of several subsets , For example, after the division, there are 5 Subset , that K = 5.
- Lowercase letters k Express : By feature dimension t After division ,5 A subset of a subset ,k = 1 It refers to the sum calculation starting from the first subset .
- |S| And |Sk| Express : aggregate S The number of elements in , there || Not an absolute value sign , and |Sk| It means after the division , The number of elements in a set .
- |S| /|Sk| Express : The ratio of the number of elements in a subset to the total number of elements in the original set , It refers to the weight of the subset information entropy , The larger the proportion , The higher the weight .
Last , Compare the information gain of different feature attributes , The higher the gain value , The higher the purity of the subset , The better the classification , We select the best feature attribute as if-else The best discriminant condition of .
边栏推荐
- R language ggplot2 visual faceting, visual facet_wrap bar plot, using strip Text function customize the size of the strip of each facet title in the facet graph (cutimi
- Initial order of pointer (basic)
- Excel calculates the difference between time and date and converts it into minutes
- Foundations of data science is free to download
- matlab将数字矩阵保存为地理空间数据出错,显示下标索引必须为正整数类型或逻辑类型,解决
- 无向图的割点
- 【FH-GFSK】FH-GFSK信号分析与盲解调研究
- 瑞萨电子RZ/G2L开发板上手评测
- FPGA - 7 Series FPGA internal structure clocking -04- multi area clock
- [AUTOSAR XIII NVM]
猜你喜欢

MySQL

Foundations of data science is free to download

安全运营四要素之资产、脆弱性、威胁和事件

信息熵的基础

【FH-GFSK】FH-GFSK信号分析与盲解调研究

Telephone network problems

测试右移:线上质量监控 ELK 实战

Linear programming of mathematical modeling (including Matlab code)

Key detection and sinusoidal signal output developed by Arduino

Assets, vulnerabilities, threats and events of the four elements of safe operation
随机推荐
数学建模之线性规划(含MATLAB代码)
[C language] branch and loop statements (Part 1)
Leetcode-849: maximum distance to the nearest person
12_微信小程序之微信视频号滚动自动播放视频效果实现
JS inheritance and prototype chain
1696C. Fishingprince plays with array [thinking questions + intermediate state + optimized storage]
按键精灵打怪学习-回城买药加血
[shutter] animation animation (shutter animation type | the core class of shutter animation)
[overview of AUTOSAR three RTE]
Embrace the safety concept of platform delivery
MySQL basic usage 02
Machine learning: numpy version linear regression predicts Boston house prices
链表中的节点每k个一组翻转
Infrared thermography temperature detection system based on arm rk3568
安全运营四要素之资产、脆弱性、威胁和事件
Matlab finds the position of a row or column in the matrix
Solve the cache problem of reactnative using WebView
FPGA - 7 Series FPGA internal structure clocking -04- multi area clock
[AUTOSAR twelve mode management]
excel表格计算时间日期的差值,并转化为分钟数