当前位置：网站首页>Chapter IV decision tree

Chapter IV decision tree

2022-07-08 01:07:00 【Intelligent control and optimization decision Laboratory of Cen】

Which parts does the decision tree learning algorithm include ？ What are the commonly used algorithms ？

The learning algorithm of decision tree mainly includes three parts ：

1. feature selection 2. The generation of trees 3. Pruning of trees

There are several commonly used algorithms ：

1.ID3 2. C4.5 3.CART

Root node of decision tree 、 What do internal nodes and leaf nodes represent respectively ？

The leaf node corresponds to the decision result , Each other node corresponds to an attribute test ; The sample set contained in each node is divided into sub nodes according to the result of attribute test ; The root node contains the complete set of samples . The path from the root node to each leaf node corresponds to a decision test sequence .

What are the criteria for feature selection （ How to choose the best partition attribute ）？

Feature selection is mainly based on information gain and information gain ratio

Information gain

First, the definition of information entropy is ： $Ent(D)=-\sum_{k=1}^{|y|}p_k$ $log_2p_k$

The information gain is ： $Gain(D,a)=Ent(D)-\sum_{v=1}^{V}\frac{|D^v|}{|D|}Ent(D^v)$

Information gain ratio

The gain ratio is defined as ： $G a i n$ _ $ratio(D,a)=\frac{Gain(D,a)}{IV(a)},$
among : $IV(a)=-\sum_{v=1}^{V}\frac{|D^v|}{|D|}log_2\frac{|D^v|}{|D|}$

How to prevent overfitting in decision trees ？

Pruning is a decision tree learning algorithm “ An important means of over fitting ”. In order to prevent the decision tree from learning the training samples too well , The risk of over fitting can be reduced by actively removing some branches . The basic strategies of pruning mainly include “ pre-pruning ” and “ After pruning ”

pre-pruning

It refers to the generation process of decision tree , Estimate each node before partitioning , If the current node partition cannot improve the generalization performance of decision tree , Then the partition is stopped and the current node is marked as a leaf node .

After pruning

After pruning, a complete decision tree is generated from the training set , Then the non leaf nodes are examined from bottom to top , If the subtree corresponding to this node is replaced by leaf nodes, the generalization ability of the decision tree can be improved , Replace the subtree with a leaf node .

How to deal with continuous values and missing values ？

1. Continuous value processing

Because the number of values of continuous attributes is no longer limited , therefore , It is impossible to divide nodes directly according to the values of continuous attributes . here , Discretization technology can be used continuously . The simplest strategy is to use dichotomy to deal with continuous attributes , That's exactly what it is. C4.5 The mechanism used in the decision tree algorithm .

2. Missing value processing

Incomplete samples are often encountered in real tasks , That is, some attribute values of the sample are missing . For example, due to the cost of diagnosis and testing 、 Privacy protection and other factors , The value of some attributes of the patient's medical data （ Such as HIV test result ） Unknown ; Especially when the number of attributes is large , There are often a large number of samples with missing values . If you simply give up incomplete samples , Only samples without missing values can be used for learning , Obviously, it is a great waste of data information .
1. So how to choose the partition attribute when the attribute value is missing ？
$\rho$ Indicates the proportion of samples without missing values , Given training set $D$ And attribute $a$ , $\hat{D}$ Express $D$ In attributes $a$ Sample subset with no missing values on .
$Gain(D,a)=\rho*Gain(\hat{D},a)$
2. Given partition properties , If the value of the sample on the attribute is missing , How to divide the samples ？
If sample $x$ In dividing attributes $a$ The value on is known , Will $x$ Transfer in the child node corresponding to its value , And the sample weight is maintained as $w_x$ ; If sample $X$ In dividing attributes $a$ Unknown value on , Will $X$ Transfer in all child nodes at the same time , And the sample weight lies in the attribute value $a^v$ The corresponding sub nodes are adjusted to $\hat{r}_v*{w_x}$ , among $\hat{r}_v$ Indicates the attribute in the sample without missing value $a$ Top value $a^v$ The proportion of the sample of .

原网站

版权声明
本文为[Intelligent control and optimization decision Laboratory of Cen]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202130549316613.html