当前位置:网站首页>Summary of machine learning + pattern recognition learning (IV) -- decision tree
Summary of machine learning + pattern recognition learning (IV) -- decision tree
2022-06-12 07:28:00 【Nidiya is trying】
One 、 Definition and introduction
1、 Classification decision tree model is a kind of description A tree structure that classifies instances , from node + There is a directional side form . Nodes are divided into internal nodes ( Represents a feature / attribute ) And leaf nodes ( Represents a class ).【 The overall structure of the tree includes : Root node 、 Non leaf node 、 leaf node 、 Branch 】
2、if-then The rules : Each path from the root node to the leaf node constructs a rule , The characteristics of the internal nodes on the path correspond to the conditions of the rules , The classes of leaf nodes correspond to the conclusions of rules .if-then Important properties of rule set : Mutually exclusive and complete . The consciousness of this rule starts from the root node , Finally, there is only one path to a specific leaf node , And there will be no instances that cannot be classified .
3、 Decision tree learning objectives : Build a decision tree model based on the training data set , Make the decision tree model classify correctly , And has good generalization ability .
4、 The essence of decision tree learning : Generalize the classification rules from the training data set , Find the classification rules between features and labels .
5、 Loss function of decision tree : It is usually a regularized maximum likelihood function . Generally, heuristic methods are used , That is, local optimum , Select the optimal feature under the current condition as the division rule .
6、 Decision tree at different stages :
(1) Training phase : With a given data set structure Decision tree .
(2) Testing phase : Start at the root node , According to the classification attributes of the decision tree, it is divided down layer by layer , Until the leaf node , Finally, the classification results are obtained .
7、 The concept is introduced :
(1) entropy : It's a random variable A measure of uncertainty . The greater the entropy , The greater the uncertainty . Entropy of random variables :
,
It can be seen from the formula that , A random variable
The entropy of is only related to its probability , That is, it depends only on its distribution , It has nothing to do with its value .
(2) Conditional entropy : Given random variables
Under the condition of , A random variable
A measure of uncertainty . The formula is :
.
When the probabilities in entropy and conditional entropy are estimated by data , Corresponding to the empirical entropy / Conditional entropy .
(3) Information gain : To learn that feature
And make the class
Information about The degree of uncertainty reduction . The formula is :
,
Understood as a : Due to features
The training data set
The degree to which the uncertainty of classification is reduced . Information gain is the degree of entropy reduction .
Two 、 Build decision tree ( The construction of decision tree is the process of recursively selecting the optimal feature , It is also the process of dividing the feature space .)
The learning algorithm of decision tree includes : feature selection 、 Decision tree generation 、 Decision tree pruning . These three parts are described below :
( One ) Feature selection and decision tree generation : Decide which feature to use to divide the feature space , A decision tree is generated based on a feature selection algorithm . Different decision tree generation algorithms have different feature selection rules , The common criterion is information gain / Information gain ratio .
1、ID3 Algorithm : With Information gain Criteria selection features . The principle is to select a feature , Make the entropy of the remaining data sets decrease to the greatest extent / Entropy decreases the most / The purer the remaining data sets , The better .
Step A: Calculation D The experience of the entropy ,
Step B: Calculating characteristics A Yes D Entropy of empirical conditions ,
Step C: Information gain is calculated by subtracting empirical conditional entropy from empirical entropy ,
2、C4.5 Algorithm : With Information gain ratio Criteria selection features . Because if the information gain is used as the feature of dividing the training set , There is a problem of choosing more features ( Because according to the information gain principle , This feature with more values adds to the “ The level of confusion ”, Selecting this feature can reduce the confusion of the data set , Maximize information gain , The purer the remaining data sets to be classified .
Information gain ratio :
, among
,
For the characteristic
The number of values .
3、CART Decision tree : With The gini coefficient Criteria selection features , When used in regression, the least square error criterion . Suppose the decision tree is a binary tree , The value of internal node characteristics is “ yes ” and “ no ”, Then Gini coefficient :
,
The Gini index ,

Intuitive to see ,Gini(p) Reflected from the dataset D Two samples randomly selected from , The probability of the inconsistency of its class marks . therefore ,Gini(p) The smaller it is , Then the dataset D The higher the purity .
( Two ) Decision tree pruning
1、 prune : If the decision tree considers too much how to improve the correct classification of training data , Too complex decisions will be constructed, resulting in over fitting . At this time, the decision tree should be pruned : Prune some subtrees or nodes from the generated decision tree , And its root node or parent node is regarded as a new leaf node . The purpose of pruning is not to minimize the loss function , The purpose of pruning is to achieve a better generalization ability .
2、 Loss function : Pruning is often achieved by minimizing the overall loss function of the decision tree . The loss function of the decision tree is :
,
among
Represents the leaf node
Number of samples included ,
Indicates the number of leaf nodes ,
And the empirical entropy :
,
remember
,
Then there are :
,
Understanding of formulas , We know empirical entropy
Indicates the chaos degree of the leaf node , Considering all the leaf nodes of the whole tree , And the number of samples in each leaf node is different , May adopt
Measure the effect of the model on the training set whole Classification error . But if only
As the optimization objective function , The model will divide each branch into the smallest possible parts , So that the empirical entropy of each leaf node is 0, So that
by 0, This eventually leads to over fitting of the model . reflection : If the model is divided into the smallest , Then the model will be very complicated , It must contain many branches and many subtrees , If we can add a penalty term related to the complexity of the model to the loss function , Take into account the complexity of the model , Can this over fitting problem be solved ? And for a tree , The more leaf nodes, the more complex , Therefore, the complexity of a tree can be measured by the number of leaf nodes , Re pass
“ Penalty factor ” To determine the impact between complexity and overall error , Got it.
. You know , The larger
The leaf node tree of the tree will be modified to a large extent “ punishment ”, Thus a simpler model , contrary , smaller
You will get a more complex model .
3、 Two pruning strategies
(1) pre-pruning :
In the process of constructing the decision tree , First, estimate each node before partition , If the current node partition can not improve the pan China performance of the decision tree model , The current node is not divided and marked as a leaf node .
(2) After pruning :
First, a complete decision tree is generated from the training set , Then the non leaf nodes are examined from bottom to top , If the subtree corresponding to this node is replaced by the leaf node, the generalization performance will be improved , Then replace the subtree with a leaf node .
The contrast between the two strategies :
① Post pruning decision trees usually retain more branches than pre pruning decision trees ;
② The risk of under fitting of post pruning decision tree is very small , Generalization performance is often better than pre pruning decision tree ;
③ The training time cost of post pruning decision tree is much larger than that of non pruning decision tree and pre pruning decision tree .
Reference resources :
边栏推荐
- Detailed explanation of addressing mode in 8086
- Right click the general solution of file rotation jam, refresh, white screen, flash back and desktop crash
- Detailed principle of 4.3-inch TFTLCD based on warship V3
- Xshell installation
- What is the difference between < t > and object?
- Non IID data and continuous learning processes in federated learning: a long road ahead
- Vscode outline preview cannot find file symbol
- 2022电工(初级)考试题库及模拟考试
- paddlepaddl 28 支持任意维度数据的梯度平衡机制GHM Loss的实现(支持ignore_index、class_weight,支持反向传播训练,支持多分类)
- Velocity autocorrelation function lammps v.s MATALB
猜你喜欢

Class as a non type template parameter of the template

Detailed explanation of 14 registers in 8086CPU

Summary of machine learning + pattern recognition learning (I) -- k-nearest neighbor method

Scons compiling imgui

Modelants II

Fcpx plug-in: simple line outgoing text title introduction animation call outs with photo placeholders for fcpx

Missing getting in online continuous learning with neuron calibration thesis analysis + code reading

Design an open source continuous deployment pipeline based on requirements

Detailed explanation of coordinate tracking of TF2 operation in ROS (example + code)

Detailed explanation of 8086/8088 system bus (sequence analysis + bus related knowledge)
随机推荐
Shortcut key modification of TMUX and VIM
Node: cannot open /node: access denied
Federated meta learning with fast convergence and effective communication
[Li Kou] curriculum series
Imx6q pwm3 modify duty cycle
Imx6q PWM drive
Principle and application of PWM
RT thread studio learning (I) new project
Decryption game of private protocol: from secret text to plaintext
RT thread studio learning summary
Acwing - 4269 school anniversary
Use case design of software testing interview questions
2022年G3锅炉水处理复训题库及答案
Qt实现托盘
Kotlin插件 kotlin-android-extensions
Summary of software testing tools in 2021 - unit testing tools
Day 6 of pyhon
1.3-1.9 summary
Unable to load bean of class marked with @configuration
Interview computer network - transport layer