当前位置:网站首页>Machine learning 6-decision tree
Machine learning 6-decision tree
2022-06-28 23:19:00 【Just a】
List of articles
One . Decision tree Overview
1.1 What is a decision tree
Decision tree input : Test set
Decision tree output : Classification rules ( Decision tree )
1.2 Overview of decision tree algorithm

Several common examples of decision trees
- ID3 Decision tree
- C4.5 Decision tree
- CART classification ( Return to ) Trees


Maximum gain of variable information , Is the most important variable , Put it on the top
There are many values for variables , But it is limited in the training set , So it can be marked
| Age | Separation point | Whether the information gain is maximum |
|---|---|---|
| 12 | ||
| 18 | 15 | |
| 19 | 18.5 | |
| 22 | 20.5 | yes |
| 29 | 25.5 | |
| 40 | 34.5 | |
| If 20.5 It's about , Maximum information gain , Then this is the best separation point |
Two . Construction of a decision tree
2.1 Construction of a decision tree : Divide and rule (divide and conquer)
Decision tree is a typical model with local and global similarity , That is, in any path , Any internal node forms a root node “ Sub decision tree ”. For such a model , Efficient 、 The feasible construction method is divide and conquer . Steps are as follows :
Input : Data sets 𝐷 = ( 𝑥 1 , 𝑦 1 ) , ( 𝑥 2 , 𝑦 2 ) , . . , ( 𝑥 𝑚 , 𝑦 𝑚 ) 𝐷={(𝑥_1,𝑦_1 ),(𝑥_2,𝑦_2 ),..,(𝑥_𝑚,𝑦_𝑚)} D=(x1,y1),(x2,y2),..,(xm,ym) And its characteristic space 𝐴 = 𝑎 1 , 𝑎 2 , … , 𝑎 𝑑 𝐴={𝑎_1,𝑎_2,…,𝑎_𝑑 } A=a1,a2,…,ad
function TreeGenerate(D,A)
- Generate the nodes Node
- If the dataset D All belong to a certain category C, Will 1 The nodes in the Node Divided into attributes C, return
- If A For an empty set , perhaps D stay A The values on are exactly the same , be 1 The nodes in the Node Mark as leaf node , The category is D Most of the categories in , return
- Select the optimal splitting node a,
For each value 𝑎 𝑉 𝑎^𝑉 aV in a:
From the node Node Generate a branch , Make the data set 𝐷_𝑉 yes D stay a The value in is 𝑎 𝑉 𝑎^𝑉 aV Subset
if 𝐷_𝑉 It's an empty set , Then the branch is used as a leaf node , The category is D Most of the categories in , return ;else Generate branch TreeGenerate(𝐷_𝑉, A{a})
End for
This is a typical recursive process , The return condition is :
- The current node contains samples of the same category
- The current property is empty
- All attribute values are the same
- The sample set contained in the current node is empty
The output of the leaf node :
The category with the largest proportion of leaf node output , That is, the category with the highest output probability . If it is transformed to output the corresponding probability of each category , It can be used to calculate the output probability in random forest .
Two questions : How to select the optimal attribute ? How to split nodes ?
The choice of optimal attributes
- Information gain and information gain rate
- gini index
Split node - discrete , There are few types of values
- discrete , There are many types of values
- Continuous type
2.2 Information gain (Information Gain)
The information entropy that measures the purity of categories :
Suppose the sample D pass the civil examinations k The proportion of class samples is 𝑝 𝐾 𝑝_𝐾 pK, be D The entropy of information is defined as
𝐸 𝑛 𝑡 𝑟 𝑜 𝑝 𝑦 ( 𝐷 ) = − ∑ 𝑘 𝑝 𝑘 l o g 2 𝑝 𝑘 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝐷)=−∑_𝑘𝑝_𝑘 log_2𝑝_𝑘 Entropy(D)=−k∑pklog2pk
Entropy The smaller it is , The higher the purity
Information entropy :entropy It represents the uncertainty of information In other words, the degree of chaos of the data , Take loans for example ,2 Person overdue ,2 People are the most chaotic when they are not overdue , The uncertainty is the highest , Information entropy is the maximum . The purity is the lowest .
Information gain :
if D By attribute a Divide into 𝐷 = ⋃ 𝑣 𝐷 𝑣 , 𝐷 𝑣 ∩ 𝐷 𝑤 = ∅ 𝐷=⋃_𝑣𝐷_𝑣 , 𝐷_𝑣∩𝐷_𝑤=∅ D=⋃vDv,Dv∩Dw=∅, Define the information gain as :
𝐺 𝑎 𝑖 𝑛 ( 𝐷 , 𝑎 ) = 𝐸 𝑛 𝑡 𝑟 𝑜 𝑝 𝑦 ( 𝐷 ) − ∑ 𝑣 ∣ 𝐷 𝑣 ∣ ∣ 𝐷 ∣ 𝐸 𝑛 𝑡 𝑟 𝑜 𝑝 𝑦 ( 𝐷 𝑣 ) 𝐺𝑎𝑖𝑛(𝐷,𝑎)=𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝐷)−∑_𝑣\frac{|𝐷_𝑣 |}{|𝐷|} 𝐸𝑛𝑡 𝑟𝑜𝑝𝑦(𝐷_𝑣) Gain(D,a)=Entropy(D)−v∑∣D∣∣Dv∣Entropy(Dv)
Now we have a data set D( For example, loan information registration form ) And characteristics A( For example, age ), be A The information gain is D Entropy and characteristics of itself A Given the conditions D The difference of conditional entropy , namely :
g ( D , A ) = H ( D ) − H ( D ∣ A ) g(D,A) = H(D) - H(D|A) g(D,A)=H(D)−H(D∣A)
Data sets D The entropy of is a constant . The larger the information gain , I mean conditional entropy The smaller it is ,A eliminate D The greater the credit for the uncertainty .
Therefore, features with large information gain shall be preferred , They have a stronger ability to classify . This generates a decision tree , be called ID3 Algorithm .
The role and characteristics of information gain :
- Measure the degree of change from disorder to order ( Commonly used in ID3 Decision tree )
- Select the attribute with the maximum information gain to split
- No generalization ability , Have a preference for attributes with more values
To control the influence of the number of attribute values , First define IV:
𝐼 𝑉 ( 𝑎 ) = − ∑ 𝑣 ∣ 𝐷 𝑣 ∣ ∣ 𝐷 ∣ l o g 2 ∣ 𝐷 𝑣 ∣ ∣ 𝐷 ∣ 𝐼𝑉(𝑎)=−∑_𝑣 \frac{|𝐷_𝑣 |}{|𝐷|} log_2 \frac{|𝐷_𝑣 |}{|𝐷|} IV(a)=−v∑∣D∣∣Dv∣log2∣D∣∣Dv∣
2.3 Information gain rate
When a feature has multiple candidate values , Information gain tends to be too large , Cause error . This problem can be corrected by introducing information gain rate .
Information gain rate is information gain and data set D Entropy ratio of :
𝐺 𝑎 𝑖 𝑛 𝑅 𝑎 𝑡 𝑖 𝑜 = 𝐺 𝑎 𝑖 𝑛 ( 𝐷 , 𝑎 ) 𝐼 𝑉 ( 𝑎 ) 𝐺𝑎𝑖𝑛 𝑅𝑎𝑡𝑖𝑜=\frac{𝐺𝑎𝑖𝑛(𝐷,𝑎)}{𝐼𝑉(𝑎)} GainRatio=IV(a)Gain(D,a)
characteristic :
It is easy to tend to attribute with few values
The attribute with the maximum gain rate can be selected for splitting
You can select an attribute set that is greater than the average gain rate , Then select the attribute with the lowest gain rate
2.4 gini index
Another measure of purity
𝐺 𝑖 𝑛 𝑖 ( 𝐷 ) = 1 − ∑ 𝑘 𝑝 𝑘 2 𝐺𝑖𝑛𝑖(𝐷)=1−∑_𝑘𝑝_𝑘^2 Gini(D)=1−k∑pk2
Gini The smaller it is , The higher the purity
attribute a In the data set D The Gini index in is
𝐺 𝑖 𝑛 𝑖 ( 𝐷 , 𝑎 ) = ∑ 𝑣 ∣ 𝐷 𝑣 ∣ ∣ 𝐷 ∣ 𝐺 𝑖 𝑛 𝑖 ( 𝐷 𝑣 ) 𝐺𝑖𝑛𝑖(𝐷,𝑎)=∑_𝑣\frac{|𝐷_𝑣 |}{|𝐷|} 𝐺𝑖𝑛𝑖(𝐷_𝑣) Gini(D,a)=v∑∣D∣∣Dv∣Gini(Dv)
Select the attribute with the minimum Gini index , namely 𝑎 ∗ = 𝑎 𝑟 𝑔 𝑚 𝑖 𝑛 𝐺 𝑖 𝑛 𝑖 ( 𝐷 , 𝑎 ) 𝑎_∗=𝑎𝑟𝑔𝑚𝑖𝑛 𝐺𝑖𝑛𝑖(𝐷,𝑎) a∗=argminGini(D,a)
2.5 Example

A simple example : With variable outlook,temperature,humidity,wind Come on playtennis To classify .
about outlook, Its information gain rate is calculated as :
(1) The calculation of the total entropy :
P(PlayTennis=Yes) = 9/14, P(PlayTennis=No) = 5/14
Entropy = -9/14*log2(9/14) – 5/14*log2(5/14) =0.9403
(2) Put the dataset D according to Outlook division , The result is :
D1: Outlook=Sunny Yes 5 Samples , among PlayTennis=Yes Yes 2 Samples ,PlayTennis=No Yes 3 Samples
Entropy1 = -2/5*log2(2/5)-3/5*log2(3/5) =0.9710
D2: Outlook=Overcast Yes 4 Samples , among PlayTennis=Yes Yes 4 Samples ,PlayTennis=No Yes 0 Samples
Entropy2 = -0/4*log2(0/4)-4/4*log2(4/4) =0 ( Definition 0*log2(0)=0)
D3: Outlook=Rain Yes 5 Samples , among PlayTennis=Yes Yes 3 Samples ,PlayTennis=No Yes 2 Samples
Entropy3 = -3/5*log2(3/5)-2/5*log2(2/5) = 0.9710
(3) Calculation IV: IV=-5/14*log2(5/14)-4/14*log2(4/14)-5/14*log2(5/14)= 1.5774
(4) Calculated information gain :Gain = 0.9403-5/14* 0.9710-4/14*0-5/14* 0.9710= 0.2467
(5) Calculate the information gain rate :Gain Ratio= 0.2467/ 1.5774= 0.1564
Calculation Outlook Of Gini:
(1) Calculation D1,D2 and D3 Of Gini:
Gini1 = 1-(2/5)2-(3/5)2=0.4800,Gini2 = 1-(4/4)2-(0/4)2=0
Gini3 = 1-(2/5)2-(3/5)2=0.4800
(2) Calculate the total Gini:
Gini(D)=5/140.4800 + 4/140 + 5/15* 2=0.4800= 0.3086
2.6 Processing of continuous variables
Continuous variables are discretized by dichotomy , And then treat it as a discrete variable .
3、 ... and . Prevent fitting : The art of pruning
There's a good chance , All attributes are split , Easy to create over fitting
Avoid overfitting , Need pruning (pruning). Two methods of pruning
- pre-pruning
In the process of decision tree generation , Each node is estimated before partition , If the current node partition can not improve the generalization performance of the decision tree , Then the partition is stopped and the current node is marked as a leaf node - After pruning
First, a completed decision tree is generated from the training set , Then investigate the non leaf nodes from bottom to top , If the subtree corresponding to this node is replaced with leaf nodes, the generalization performance of the decision tree can be improved , Replace the subtree with a leaf node
** Generalization ability :** The adaptability of the algorithm to fresh samples . You can use the set aside method , That is, select a part of the training set that does not participate in the construction of the decision tree , It is used to evaluate the test of a certain node partition or subtree merging .
Four . A combination method to improve the accuracy of classifiers
4.1 Overview of combination methods

Why combination method can improve classification accuracy 
Advantages of combinatorial algorithms 
4.2 Bagging algorithm

There's put back sampling :
Advantages of bagging algorithm :
4.3 promote (boosting) Algorithmic thought

Adaboost Algorithm :

Advantages and disadvantages of the lifting algorithm :
4.4 Random forests
Decision tree reinforcement algorithm : Random forests
4.4.1 The basic concept of random forest
Random forest is a kind of integrated model . It consists of several decision trees , The final judgment result is decided by simple voting based on the result of each decision tree .
4.4.2 Data set extraction
In the process of building a random forest model , The key step is to extract some samples from the original data set to form a new training set , And the sample size remains unchanged . The subsequent construction of each decision tree model is based on the corresponding sampling training set .
“ Random forests ” Medium “ Random ” The two characters are mainly reflected in 2 aspect :
- When extracting data from the overall training set , The sample is taken back .
- After forming a new training set , And then extract some attributes from the attribute set without putting them back to develop the decision tree model
These random operations can enhance the generalization ability of the model , The problem of over fitting is effectively avoided . And other models ( For example, maximal gradient lifting tree ).
stay 1 in , Every time a data set with the same sample size is generated by putting it back , There are about 1/3 The sample of is not selected , Reserved for “ Out of bag data ”
stay 2 in , Assuming the original m Attributes , In the 2 The general choice in this step is log_2𝑚 Attribute for decision tree development .
4.4.3 Fusion of decision tree results
After getting several decision trees , Will fuse the results of the model . In random forests , The method of fusion is usually simple voting . hypothesis K The votes for each decision tree are 𝑡_1, 𝑡_2,…,𝑡_𝐾, 𝑡_𝑖∈{0,1}, The final classification result is 
The output probability of random forest
At the same time, random forest also supports the output of results in the form of probability :
Reference resources :
- http://www.dataguru.cn/article-4063-1.html
- https://zhuanlan.zhihu.com/p/76590801
边栏推荐
- LeetCode 324 摆动排序 II[排序 双指针] HERODING的LeetCode之路
- Counting sorting and stability of sorting
- Flowable boundary timer
- Production environment sonarqube installation
- Cs5463 code module analysis (including download link)
- C interview questions_ 20220627 record
- Finally, someone explained the cloud native architecture
- Undefined symbol main (referred from entry9a.o).
- 2022年PMP项目管理考试敏捷知识点(4)
- VSCode里使用条件断点(基于GDB)
猜你喜欢

note

全面掌握const的用法《一》

Online sql to htmltable tool

YuMinHong set up two funds funded by his hometown

CIN at QT (the clearest tutorial in the whole network)

Wechat red envelope cover making tutorial and use guide with link jump

见丰知夏|国漫鼻祖丰子恺,数字藏品独家发售

MATLAB 学习笔记(6)MATLAB 的 upsample 函数和 downsample 函数

【深度学习】(3) Transformer 中的 Encoder 机制,附Pytorch完整代码

A password error occurred when docker downloaded the MySQL image to create a database link
随机推荐
油猴脚本学习
邂逅阿维塔 11:强产品力下久违的新鲜感
Is it difficult to register stocks and open accounts online? Is it safe to open an account online?
Matlab learning notes (6) upsample function and downsample function of MATLAB
Tanghongbin, Yaya live CTO: to truly localize, the product should not have the attribute of "origin"
Design e-commerce seckill system
MATLAB 学习笔记(6)MATLAB 的 upsample 函数和 downsample 函数
两栏布局左边图片显示部分由右边内容高度决定
[数学建模]Matlab非线性规划之fmincon()函数
When dialogfragment's onstop is completely invisible, call disass to exit the interface and report an error. Solution
Use conditional breakpoints in vscode (based on GDB)
What is the difference between WMS warehouse management system and ERP
[sword finger offer] 50 First character that appears only once
Leetcode detailed explanation of stack type
Online linear programming: Dual convergence, new algorithms, and regret bounds
Interviewer: what is the internal implementation of strings in redis?
Linq连表查询
WEB API学习笔记1
mysql-5.7.30-winx64免安装版下载安装教程
Web API learning notes 1