当前位置:网站首页>Basis of information entropy
Basis of information entropy
2022-07-03 01:10:00 【The code family】
Information entropy
The concept of information entropy is defined by Claude · Shannon On 1948 in . Shannon is a famous American mathematician 、 Founder of information theory , He put forward “ Information entropy ” The concept of , It laid the foundation for information theory and digital communication .
Understand information entropy
Information entropy is a measure of uncertainty , That is, the probability of discrete random time , Simply speaking “ The more chaotic the situation becomes , The greater the entropy of information , On the contrary, the smaller .”
The great mathematician Shannon gave the formula of information entropy , As shown below :
among , p Means probability , here “X” Represents the set for information entropy calculation . In the decision tree classification algorithm , We can calculate the proportion of each category ( The higher the proportion , The higher the purity of this category ) To understand the , among N Indicates the number of categories , and Pk Presentation category K Proportion in subset . Understand the above meaning , It is very simple to understand the calculation process of information entropy , Divided into three times and four operations , That is, multiplication 、 Sum and negate .
Information entropy formula calculation
Let's take a simple example , A simple application of the above information entropy calculation formula , stay Binary classification problem in , If all the current samples belong to k Category , Then the proportion of this category in the subset node reaches 100%( The other category accounts for 0), namely pk = 1, At this time, the calculation formula of information entropy is as follows :
The algorithm of logarithmic function will not be repeated here , With 2 Bottom 1 The logarithm of is 0, Therefore, the sum result of the information entropy of the two categories is 0. The entropy of information is 0 Explain that the categories within the subset are consistent “ In order ”. From this we can also know that pk=0.5 When the information entropy reaches the maximum . Based on the above information , We draw Information entropy Function image of , As shown below :
ID3 Algorithm — Information gain
Through the learning of the previous knowledge , We know that the decision tree algorithm includes all A collection of categories Is the calculation object , And through conditional discrimination , Select the category with high purity , So how do we use information entropy to select decision conditions from feature sets ? Now let's ID3 The algorithm is illustrated by an example .
ID3(Iterative Dichotomiser 3, Iterative binary tree 3 generation ) Algorithm is one of the decision tree algorithms , It is based on Occam's razor principle Realized , The core idea of this principle is “ The greatest truths are the simplest , Do more with as few things as possible ”.
Apply the above principles to the decision tree , And then there is ID3 The core idea of the algorithm : A smaller decision tree is better than a larger one , That is, use as few criteria as possible .ID3 The algorithm uses information gain to realize the selection of discrimination conditions , From Shannon's “ Information theory ” We can learn from ,ID3 The algorithm selects the feature dimension with the largest information gain to if -else Distinguish .
1) Understand information gain
In short , Information gain is specific to a specific feature , The presence or absence of a feature is important to the whole system 、 The influence degree of the set can be used as “ Information gain ” To describe . We know , After a if-else After discrimination , The original category set is split into two sets , And our goal is to make a certain category of one set “ The purity ” As high as possible , If the purity of the split subset is higher than that of the original set , That means this is a if-else Division is effective . Made by comparison “ The purity ” The highest division condition , That's what we're looking for “ Most suitable ” The distinguishing condition of the characteristic dimension of .
2) Information gain formula
So how to calculate the information gain value , Here we can use information entropy to calculate . We judge by comparing the information entropy of the set before and after the partition , That is, subtracting , Subtract the information entropy divided according to the attribute of feature dimension from the information entropy of the set before division , Information gain value can be obtained . The formula is as follows :
G(S,a) Represents a collection S Select the feature attribute t To define the information gain of molecular set .H(x) Represents the information entropy of a set . Aforementioned “ Subtract ” It looks a little complicated , Let's focus on the meaning of subtraction :
- Capital K Express : By feature dimension t After division , Produced the meaning of several subsets , For example, after the division, there are 5 Subset , that K = 5.
- Lowercase letters k Express : By feature dimension t After division ,5 A subset of a subset ,k = 1 It refers to the sum calculation starting from the first subset .
- |S| And |Sk| Express : aggregate S The number of elements in , there || Not an absolute value sign , and |Sk| It means after the division , The number of elements in a set .
- |S| /|Sk| Express : The ratio of the number of elements in a subset to the total number of elements in the original set , It refers to the weight of the subset information entropy , The larger the proportion , The higher the weight .
Last , Compare the information gain of different feature attributes , The higher the gain value , The higher the purity of the subset , The better the classification , We select the best feature attribute as if-else The best discriminant condition of .
边栏推荐
- 【FH-GFSK】FH-GFSK信号分析与盲解调研究
- 瑞萨RZ/G2L 处理器简介|框架图|功耗|原理图及硬件设计指南
- Strongly connected components of digraph
- 1038 Recover the Smallest Number
- 2022.2.14 resumption
- Trois tâches principales: asynchrone, courrier et timing
- Foundations of data science is free to download
- FPGA - 7 Series FPGA internal structure clocking -04- multi area clock
- Matlab finds the position of a row or column in the matrix
- Machine learning: numpy version linear regression predicts Boston house prices
猜你喜欢
全志A40i/T3如何通过SPI转CAN
(C language) data storage
excel表格计算时间日期的差值,并转化为分钟数
1696C. Fishingprince plays with array [thinking questions + intermediate state + optimized storage]
Advanced pointer (I)
信息熵的基础
1696C. Fishingprince Plays With Array【思维题 + 中间状态 + 优化存储】
Matlab saves the digital matrix as geospatial data, and the display subscript index must be of positive integer type or logical type. Solve the problem
leetcode:871. 最低加油次数【以前pat做过 + 最大堆 +贪心】
异步、邮件、定时三大任务
随机推荐
The difference between relational database and non relational database
Several cases of recursive processing organization
Leetcode-241: designing priorities for operational expressions
[AUTOSAR VI description document]
[C language] branch and loop statements (Part 1)
异步、郵件、定時三大任務
Advanced pointer (I)
Basic use of sringcloud & use of component Nacos
[shutter] animation animation (shutter animation type | the core class of shutter animation)
数据分析思维分析犯法和业务知识——分析方法(一)
正确甄别API、REST API、RESTful API和Web Service之间的异同
Kivy教程大全之如何在 Kivy 中创建下拉列表
Initial order of pointer (basic)
[overview of AUTOSAR four BSW]
递归处理组织的几种情况
Array and collection performance comparison
1038 Recover the Smallest Number
RK3568开发板评测篇(二):开发环境搭建
KingbaseES ALTER TABLE 中 USING 子句的用法
1696C. Fishingprince plays with array [thinking questions + intermediate state + optimized storage]