当前位置:网站首页>Basis of information entropy
Basis of information entropy
2022-07-03 01:10:00 【The code family】
Information entropy
The concept of information entropy is defined by Claude · Shannon On 1948 in . Shannon is a famous American mathematician 、 Founder of information theory , He put forward “ Information entropy ” The concept of , It laid the foundation for information theory and digital communication .
Understand information entropy
Information entropy is a measure of uncertainty , That is, the probability of discrete random time , Simply speaking “ The more chaotic the situation becomes , The greater the entropy of information , On the contrary, the smaller .”
The great mathematician Shannon gave the formula of information entropy , As shown below :

among , p Means probability , here “X” Represents the set for information entropy calculation . In the decision tree classification algorithm , We can calculate the proportion of each category ( The higher the proportion , The higher the purity of this category ) To understand the , among N Indicates the number of categories , and Pk Presentation category K Proportion in subset . Understand the above meaning , It is very simple to understand the calculation process of information entropy , Divided into three times and four operations , That is, multiplication 、 Sum and negate .
Information entropy formula calculation
Let's take a simple example , A simple application of the above information entropy calculation formula , stay Binary classification problem in , If all the current samples belong to k Category , Then the proportion of this category in the subset node reaches 100%( The other category accounts for 0), namely pk = 1, At this time, the calculation formula of information entropy is as follows :

The algorithm of logarithmic function will not be repeated here , With 2 Bottom 1 The logarithm of is 0, Therefore, the sum result of the information entropy of the two categories is 0. The entropy of information is 0 Explain that the categories within the subset are consistent “ In order ”. From this we can also know that pk=0.5 When the information entropy reaches the maximum . Based on the above information , We draw Information entropy Function image of , As shown below :

ID3 Algorithm — Information gain
Through the learning of the previous knowledge , We know that the decision tree algorithm includes all A collection of categories Is the calculation object , And through conditional discrimination , Select the category with high purity , So how do we use information entropy to select decision conditions from feature sets ? Now let's ID3 The algorithm is illustrated by an example .
ID3(Iterative Dichotomiser 3, Iterative binary tree 3 generation ) Algorithm is one of the decision tree algorithms , It is based on Occam's razor principle Realized , The core idea of this principle is “ The greatest truths are the simplest , Do more with as few things as possible ”.
Apply the above principles to the decision tree , And then there is ID3 The core idea of the algorithm : A smaller decision tree is better than a larger one , That is, use as few criteria as possible .ID3 The algorithm uses information gain to realize the selection of discrimination conditions , From Shannon's “ Information theory ” We can learn from ,ID3 The algorithm selects the feature dimension with the largest information gain to if -else Distinguish .
1) Understand information gain
In short , Information gain is specific to a specific feature , The presence or absence of a feature is important to the whole system 、 The influence degree of the set can be used as “ Information gain ” To describe . We know , After a if-else After discrimination , The original category set is split into two sets , And our goal is to make a certain category of one set “ The purity ” As high as possible , If the purity of the split subset is higher than that of the original set , That means this is a if-else Division is effective . Made by comparison “ The purity ” The highest division condition , That's what we're looking for “ Most suitable ” The distinguishing condition of the characteristic dimension of .
2) Information gain formula
So how to calculate the information gain value , Here we can use information entropy to calculate . We judge by comparing the information entropy of the set before and after the partition , That is, subtracting , Subtract the information entropy divided according to the attribute of feature dimension from the information entropy of the set before division , Information gain value can be obtained . The formula is as follows :

G(S,a) Represents a collection S Select the feature attribute t To define the information gain of molecular set .H(x) Represents the information entropy of a set . Aforementioned “ Subtract ” It looks a little complicated , Let's focus on the meaning of subtraction :
- Capital K Express : By feature dimension t After division , Produced the meaning of several subsets , For example, after the division, there are 5 Subset , that K = 5.
- Lowercase letters k Express : By feature dimension t After division ,5 A subset of a subset ,k = 1 It refers to the sum calculation starting from the first subset .
- |S| And |Sk| Express : aggregate S The number of elements in , there || Not an absolute value sign , and |Sk| It means after the division , The number of elements in a set .
- |S| /|Sk| Express : The ratio of the number of elements in a subset to the total number of elements in the original set , It refers to the weight of the subset information entropy , The larger the proportion , The higher the weight .
Last , Compare the information gain of different feature attributes , The higher the gain value , The higher the purity of the subset , The better the classification , We select the best feature attribute as if-else The best discriminant condition of .
边栏推荐
- R language generalized linear model function GLM, (model fit and expression diagnostics), model adequacy evaluation method, use plot function and car package function
- [love crash] neglected details of gibaro
- Array and collection performance comparison
- [flutter] icons component (load the built-in icon of flutter | display the material design icon completely)
- 瑞萨RZ/G2L ARM开发板存储读写速度与网络实测
- 拥抱平台化交付的安全理念
- Machine learning: numpy version linear regression predicts Boston house prices
- [shutter] image component (cached_network_image network image caching plug-in)
- [applet project development -- JD mall] user defined search component of uni app (middle) -- search suggestions
- Infrared thermography temperature detection system based on arm rk3568
猜你喜欢

Deep analysis of data storage in memory

数学建模之线性规划(含MATLAB代码)

How to convert Quanzhi a40i/t3 to can through SPI

飞凌搭载TI AM62x的ARM核心板/开发板首发上市,亮相Embedded World 2022

How to systematically learn machine learning

(C语言)数据的存储

Excel removes the data after the decimal point and rounds the number
![[AUTOSAR XIII NVM]](/img/38/805ab70f199e2cfad4d9dae0e2c1ff.png)
[AUTOSAR XIII NVM]

安全运营四要素之资产、脆弱性、威胁和事件
![[AUTOSAR II appl overview]](/img/da/76ccc05e2199705b20d8304bfb86b2.png)
[AUTOSAR II appl overview]
随机推荐
链表内指定区间反转
The difference between tail -f, tail -f and tail
拥抱平台化交付的安全理念
[AUTOSAR I overview]
Excel removes the data after the decimal point and rounds the number
Cut point of undirected graph
Specified interval inversion in the linked list
Leetcode-871: minimum refueling times
excel表格计算时间日期的差值,并转化为分钟数
leetcode:871. 最低加油次数【以前pat做过 + 最大堆 +贪心】
【C语言】分支和循环语句(上)
按键精灵打怪学习-前台和内网发送后台验证码
【无标题】
链表中的节点每k个一组翻转
用Go+绘制爱心给心爱的她表白
The arm core board / development board of Feiling equipped with Ti am62x made its debut in embedded world 2022
电话网络问题
Matlab finds the position of a row or column in the matrix
leetcode:701. Insertion in binary search tree [BST insertion]
机器学习术语