当前位置:网站首页>Basis of information entropy
Basis of information entropy
2022-07-03 01:10:00 【The code family】
Information entropy
The concept of information entropy is defined by Claude · Shannon On 1948 in . Shannon is a famous American mathematician 、 Founder of information theory , He put forward “ Information entropy ” The concept of , It laid the foundation for information theory and digital communication .
Understand information entropy
Information entropy is a measure of uncertainty , That is, the probability of discrete random time , Simply speaking “ The more chaotic the situation becomes , The greater the entropy of information , On the contrary, the smaller .”
The great mathematician Shannon gave the formula of information entropy , As shown below :

among , p Means probability , here “X” Represents the set for information entropy calculation . In the decision tree classification algorithm , We can calculate the proportion of each category ( The higher the proportion , The higher the purity of this category ) To understand the , among N Indicates the number of categories , and Pk Presentation category K Proportion in subset . Understand the above meaning , It is very simple to understand the calculation process of information entropy , Divided into three times and four operations , That is, multiplication 、 Sum and negate .
Information entropy formula calculation
Let's take a simple example , A simple application of the above information entropy calculation formula , stay Binary classification problem in , If all the current samples belong to k Category , Then the proportion of this category in the subset node reaches 100%( The other category accounts for 0), namely pk = 1, At this time, the calculation formula of information entropy is as follows :

The algorithm of logarithmic function will not be repeated here , With 2 Bottom 1 The logarithm of is 0, Therefore, the sum result of the information entropy of the two categories is 0. The entropy of information is 0 Explain that the categories within the subset are consistent “ In order ”. From this we can also know that pk=0.5 When the information entropy reaches the maximum . Based on the above information , We draw Information entropy Function image of , As shown below :

ID3 Algorithm — Information gain
Through the learning of the previous knowledge , We know that the decision tree algorithm includes all A collection of categories Is the calculation object , And through conditional discrimination , Select the category with high purity , So how do we use information entropy to select decision conditions from feature sets ? Now let's ID3 The algorithm is illustrated by an example .
ID3(Iterative Dichotomiser 3, Iterative binary tree 3 generation ) Algorithm is one of the decision tree algorithms , It is based on Occam's razor principle Realized , The core idea of this principle is “ The greatest truths are the simplest , Do more with as few things as possible ”.
Apply the above principles to the decision tree , And then there is ID3 The core idea of the algorithm : A smaller decision tree is better than a larger one , That is, use as few criteria as possible .ID3 The algorithm uses information gain to realize the selection of discrimination conditions , From Shannon's “ Information theory ” We can learn from ,ID3 The algorithm selects the feature dimension with the largest information gain to if -else Distinguish .
1) Understand information gain
In short , Information gain is specific to a specific feature , The presence or absence of a feature is important to the whole system 、 The influence degree of the set can be used as “ Information gain ” To describe . We know , After a if-else After discrimination , The original category set is split into two sets , And our goal is to make a certain category of one set “ The purity ” As high as possible , If the purity of the split subset is higher than that of the original set , That means this is a if-else Division is effective . Made by comparison “ The purity ” The highest division condition , That's what we're looking for “ Most suitable ” The distinguishing condition of the characteristic dimension of .
2) Information gain formula
So how to calculate the information gain value , Here we can use information entropy to calculate . We judge by comparing the information entropy of the set before and after the partition , That is, subtracting , Subtract the information entropy divided according to the attribute of feature dimension from the information entropy of the set before division , Information gain value can be obtained . The formula is as follows :

G(S,a) Represents a collection S Select the feature attribute t To define the information gain of molecular set .H(x) Represents the information entropy of a set . Aforementioned “ Subtract ” It looks a little complicated , Let's focus on the meaning of subtraction :
- Capital K Express : By feature dimension t After division , Produced the meaning of several subsets , For example, after the division, there are 5 Subset , that K = 5.
- Lowercase letters k Express : By feature dimension t After division ,5 A subset of a subset ,k = 1 It refers to the sum calculation starting from the first subset .
- |S| And |Sk| Express : aggregate S The number of elements in , there || Not an absolute value sign , and |Sk| It means after the division , The number of elements in a set .
- |S| /|Sk| Express : The ratio of the number of elements in a subset to the total number of elements in the original set , It refers to the weight of the subset information entropy , The larger the proportion , The higher the weight .
Last , Compare the information gain of different feature attributes , The higher the gain value , The higher the purity of the subset , The better the classification , We select the best feature attribute as if-else The best discriminant condition of .
边栏推荐
- leetcode:871. 最低加油次数【以前pat做过 + 最大堆 +贪心】
- 链表内指定区间反转
- Solve the cache problem of reactnative using WebView
- MySQL基础用法02
- R language ggplot2 visualization: use ggplot2 to display dataframe data that are all classified variables in the form of thermal diagram, and customize the legend color legend of factor
- Cut point of undirected graph
- Delete duplicate elements in the ordered linked list -ii
- 12_微信小程序之微信视频号滚动自动播放视频效果实现
- [introduction to AUTOSAR seven tool chain]
- Lu Zhe, chief scientist of Shiping information: building data and personnel centered security capabilities
猜你喜欢

全志A40i/T3如何通过SPI转CAN

Strongly connected components of digraph
![[introduction to AUTOSAR seven tool chain]](/img/cf/ed0ccf39d38e0b4fc3d97d4fd58a7e.png)
[introduction to AUTOSAR seven tool chain]
![[AUTOSAR nine c/s principle Architecture]](/img/59/ce32c0ff58ef5d8385fe950136175b.png)
[AUTOSAR nine c/s principle Architecture]

详解RDD基本概念、RDD五大属性

FPGA - 7 Series FPGA internal structure clocking -04- multi area clock
[AUTOSAR five methodology]

数据分析思维分析犯法和业务知识——分析方法(一)

leetcode:701. 二叉搜索树中的插入操作【bst的插入】

Deep analysis of data storage in memory
随机推荐
异步、邮件、定时三大任务
攻克哈希的基本概念与实现
Explain the basic concepts and five attributes of RDD in detail
基本远程连接工具Xshell
[AUTOSAR five methodology]
Leetcode-2115: find all the dishes that can be made from the given raw materials
Matlab finds the position of a row or column in the matrix
[AUTOSAR twelve mode management]
Correctly distinguish the similarities and differences among API, rest API, restful API and web service
【FH-GFSK】FH-GFSK信号分析与盲解调研究
安全运营四要素之资产、脆弱性、威胁和事件
[overview of AUTOSAR four BSW]
【无标题】
leetcode-241:为运算表达式设计优先级
Leetcode-849: maximum distance to the nearest person
Assets, vulnerabilities, threats and events of the four elements of safe operation
【FPGA教程案例5】基于vivado核的ROM设计与实现
465. 最优账单平衡 DFS 回溯
On Fibonacci sequence
基于ARM RK3568的红外热成像体温检测系统