当前位置：网站首页>Decision tree of machine learning

Decision tree of machine learning

2022-07-03 06:10:00 【Master core technology】

The decision tree algorithm represents the classification results of data in a tree structure , Each leaf node corresponds to the decision result .
Divide and choose ： We hope that the branch nodes of the decision tree contain samples that belong to the same category as much as possible , That is, the purity of the node is high

ID3 Decision tree Information gain
“ Information entropy ”(information entropy) It is the most commonly used index to measure the purity of sample set , Suppose the current sample set D pass the civil examinations k The proportion of class samples is $p^k$ (k=1,2,…|y|)（ In the second category |y|=2）, be D The entropy of information is defined as
$Ent(D)=-\sum\limits_{k=1}^{|y|}p_klog_2p_k$
Ent(D) The smaller the value of , be D The higher the purity .
Suppose discrete properties a Yes V Possible values { $a^1,a^2,….,a^V$ }, If you use a To the sample set D division , Will produce V Branch nodes , Among them the first v Branch nodes contain D All in attributes a The upper value is $a^V$ The sample of , Write it down as $D^v$ . Considering that the number of samples contained in different branch nodes is different , Give weight to the branch $D^v|/|D|$ . Calculation a For the sample set D Divide the obtained “ Information gain ”(information gain):
$Gain(D,a)=Ent(D)-\sum\limits_{v=1}^V\frac{|D^v|}{|D|}Ent(D^v)$
generally speaking , The larger the information gain , Use attributes a To divide the “ Purity improvement ” The bigger it is

C4.5 Decision tree Gain rate
actually , The information gain criterion has a preference for attributes with more values , In order to reduce the possible adverse effects of this preference, use the gain rate (gain ratio) To divide the optimal attributes
$Gain\_ratio(D,a)=\frac{Gain(D,a)}{IV(a)}$
among $IV(a)=-\sum\limits_{v=1}^V\frac{|D^v|}{|D|}log_2\frac{|D^v|}{|D|}$
Be careful , The information rate criterion has a preference for attributes with a small number of values , therefore ,C4.5 Instead of directly selecting the candidate partition attribute with the largest gain rate , Instead, we first find the attributes with higher than average information gain from the candidate partition attributes , Then select the one with high gain rate .
CART Decision tree Classification and Regression Tree The gini coefficient
Data sets D The purity of can be measured by Gini ：
$Gini(D)=\sum\limits_{k=1}^{|y|}\sum\limits_{k'\neq k}p_kp_{k'}=1-\sum\limits_{k=1}^{|y|}p_k^2$
Gini(D) It reflects that two samples are randomly selected from the data set , The probability of different category marks , therefore ,Gini(D) The smaller it is , Data sets D The higher the purity .
attribute a The Gini coefficient of is defined as ：
$Gini\_index(D,a)=\sum\limits_{v=1}^V\frac{|D^v|}{|D|}Gini(D^v)$
The attribute that minimizes the Gini coefficient after partition is selected as the optimal partition attribute

prune ： It's a decision tree “ Over fitting ” The primary means , There are mainly pre pruning (prepruning) And after pruning (postpruning) Two strategies

原网站

版权声明
本文为[Master core technology]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202150616559189.html

当前位置：网站首页>Decision tree of machine learning

Decision tree of machine learning

边栏推荐

猜你喜欢

随机推荐