当前位置:网站首页>Chapter IV decision tree
Chapter IV decision tree
2022-07-08 01:07:00 【Intelligent control and optimization decision Laboratory of Cen】
Which parts does the decision tree learning algorithm include ? What are the commonly used algorithms ?
The learning algorithm of decision tree mainly includes three parts :
1. feature selection 2. The generation of trees 3. Pruning of trees
There are several commonly used algorithms :
1.ID3 2. C4.5 3.CART
Root node of decision tree 、 What do internal nodes and leaf nodes represent respectively ?
The leaf node corresponds to the decision result , Each other node corresponds to an attribute test ; The sample set contained in each node is divided into sub nodes according to the result of attribute test ; The root node contains the complete set of samples . The path from the root node to each leaf node corresponds to a decision test sequence .
What are the criteria for feature selection ( How to choose the best partition attribute )?
Feature selection is mainly based on information gain and information gain ratio
Information gain
First, the definition of information entropy is : E n t ( D ) = − ∑ k = 1 ∣ y ∣ p k Ent(D)=-\sum_{k=1}^{|y|}p_k Ent(D)=−∑k=1∣y∣pk l o g 2 p k log_2p_k log2pk
The information gain is : G a i n ( D , a ) = E n t ( D ) − ∑ v = 1 V ∣ D v ∣ ∣ D ∣ E n t ( D v ) Gain(D,a)=Ent(D)-\sum_{v=1}^{V}\frac{|D^v|}{|D|}Ent(D^v) Gain(D,a)=Ent(D)−∑v=1V∣D∣∣Dv∣Ent(Dv)
Information gain ratio
The gain ratio is defined as : G a i n Gain Gain_ r a t i o ( D , a ) = G a i n ( D , a ) I V ( a ) , ratio(D,a)=\frac{Gain(D,a)}{IV(a)}, ratio(D,a)=IV(a)Gain(D,a),
among : I V ( a ) = − ∑ v = 1 V ∣ D v ∣ ∣ D ∣ l o g 2 ∣ D v ∣ ∣ D ∣ IV(a)=-\sum_{v=1}^{V}\frac{|D^v|}{|D|}log_2\frac{|D^v|}{|D|} IV(a)=−∑v=1V∣D∣∣Dv∣log2∣D∣∣Dv∣
How to prevent overfitting in decision trees ?
Pruning is a decision tree learning algorithm “ An important means of over fitting ”. In order to prevent the decision tree from learning the training samples too well , The risk of over fitting can be reduced by actively removing some branches . The basic strategies of pruning mainly include “ pre-pruning ” and “ After pruning ”
pre-pruning
It refers to the generation process of decision tree , Estimate each node before partitioning , If the current node partition cannot improve the generalization performance of decision tree , Then the partition is stopped and the current node is marked as a leaf node .
After pruning
After pruning, a complete decision tree is generated from the training set , Then the non leaf nodes are examined from bottom to top , If the subtree corresponding to this node is replaced by leaf nodes, the generalization ability of the decision tree can be improved , Replace the subtree with a leaf node .
How to deal with continuous values and missing values ?
1. Continuous value processing
Because the number of values of continuous attributes is no longer limited , therefore , It is impossible to divide nodes directly according to the values of continuous attributes . here , Discretization technology can be used continuously . The simplest strategy is to use dichotomy to deal with continuous attributes , That's exactly what it is. C4.5 The mechanism used in the decision tree algorithm .
2. Missing value processing
Incomplete samples are often encountered in real tasks , That is, some attribute values of the sample are missing . For example, due to the cost of diagnosis and testing 、 Privacy protection and other factors , The value of some attributes of the patient's medical data ( Such as HIV test result ) Unknown ; Especially when the number of attributes is large , There are often a large number of samples with missing values . If you simply give up incomplete samples , Only samples without missing values can be used for learning , Obviously, it is a great waste of data information .
1. So how to choose the partition attribute when the attribute value is missing ?
ρ \rho ρ Indicates the proportion of samples without missing values , Given training set D D D And attribute a a a, D ^ \hat{D} D^ Express D D D In attributes a a a Sample subset with no missing values on .
G a i n ( D , a ) = ρ ∗ G a i n ( D ^ , a ) Gain(D,a)=\rho*Gain(\hat{D},a) Gain(D,a)=ρ∗Gain(D^,a)
2. Given partition properties , If the value of the sample on the attribute is missing , How to divide the samples ?
If sample x x x In dividing attributes a a a The value on is known , Will x x x Transfer in the child node corresponding to its value , And the sample weight is maintained as w x w_x wx; If sample X X X In dividing attributes a a a Unknown value on , Will X X X Transfer in all child nodes at the same time , And the sample weight lies in the attribute value a v a^v av The corresponding sub nodes are adjusted to r ^ v ∗ w x \hat{r}_v*{w_x} r^v∗wx, among r ^ v \hat{r}_v r^v Indicates the attribute in the sample without missing value a a a Top value a v a^v av The proportion of the sample of .
边栏推荐
- [reprint] solve the problem that CONDA installs pytorch too slowly
- Kubernetes static pod (static POD)
- 基于卷积神经网络的恶意软件检测方法
- Codeforces Round #804 (Div. 2)
- Marubeni official website applet configuration tutorial is coming (with detailed steps)
- Codeforces Round #804 (Div. 2)(A~D)
- 跨模态语义关联对齐检索-图像文本匹配(Image-Text Matching)
- Markdown learning (entry level)
- 5.过拟合,dropout,正则化
- How to transfer Netease cloud music /qq music to Apple Music
猜你喜欢
5. Over fitting, dropout, regularization
NVIDIA Jetson测试安装yolox过程记录
11. Recurrent neural network RNN
利用GPU训练网络模型
Introduction to ML regression analysis of AI zhetianchuan
133. Clone map
Codeforces Round #804 (Div. 2)(A~D)
《因果性Causality》教程,哥本哈根大学Jonas Peters讲授
5.过拟合,dropout,正则化
[Yugong series] go teaching course 006 in July 2022 - automatic derivation of types and input and output
随机推荐
What does interface testing test?
y59.第三章 Kubernetes从入门到精通 -- 持续集成与部署(三二)
14. Draw network model structure
手机上炒股安全么?
Mathematical modeling -- knowledge map
3. MNIST dataset classification
For the first time in China, three Tsinghua Yaoban undergraduates won the stoc best student thesis award
Su embedded training - Day9
2022-07-07: the original array is a monotonic array with numbers greater than 0 and less than or equal to K. there may be equal numbers in it, and the overall trend is increasing. However, the number
牛客基础语法必刷100题之基本类型
13. Model saving and loading
tourist的NTT模板
How does starfish OS enable the value of SFO in the fourth phase of SFO destruction?
Markdown learning (entry level)
2.非线性回归
炒股开户怎么最方便,手机上开户安全吗
STL--String类的常用功能复写
利用GPU训练网络模型
Kubernetes static pod (static POD)
ReentrantLock 公平锁源码 第0篇