当前位置：网站首页>Popular understanding of random forest

Popular understanding of random forest

2022-07-03 15:20:00 【alw_ one hundred and twenty-three】

Since there is a decision tree , Is there an algorithm that uses multiple decision trees to form a forest ？ Yes ！ That is random forest . Random forest is a kind of forest called Bagging Variant of the algorithm framework . So if you want to understand random forest, you must first understand Bagging.

Bagging

What is? Bagging

Bagging yes Bootstrap Aggregating English abbreviations , Don't mistakenly think that you just contacted Bagging It's an algorithm ,Bagging It is a learning framework in integrated learning , Bagging It is a parallel integrated learning method . The famous random forest algorithm is in Bagging Based on the modified algorithm .

** Bagging The core idea of the method is that three cobblers top Zhuge Liang **. If you use Bagging Solve the problem of classification , Is to integrate the results of multiple classifiers to vote , Choose the result with the highest number of votes as the final result . If you use Bagging Solve the problem of return , The results of multiple regressors are added up and averaged , Take the average as the final result .

that Bagging The method is so effective , for instance . Werewolf kill I believe you should have played , Before dark , The villagers have to vote on who may be a werewolf based on what happened that day and what others found .

If we regard each villager as a classifier , Then the task of each villager is to classify , hypothesis $h_i(x)$ It means the first one i One villager thinks x Is it a werewolf ( -1 It means you're not a werewolf , 1 It means werewolf ), $f (x)$ Express x Real identity ( Is it a werewolf ), $\epsilon$ Expressed as the error rate of villagers' wrong judgment . Then there are $P(h_i(x)\neq f(x))=\epsilon$ .

According to the rules of werewolf killing , The villagers need to vote to decide who the werewolf is before dark , In other words, if more than half of the villagers guessed right when voting , So this round is right . So let's say there is now T A villager , $H (x)$ It means the final result after voting , Then there are $H(x)=sign(\sum_{i=1}^Th_i(x))$ .

Now suppose that every villager has his own opinion , Everyone has his own ideas about who is a werewolf , Then their error rates are independent of each other . So according to Hoeffding inequality You know , $H (x)$ The error rate is ： $P(H(x)\neq f(x))=\sum_{k=0}^{T/2}C_T^k(1-\epsilon)^k\epsilon ^{T-k} \leq exp(-\frac{1}{2}T(1-2\epsilon)^2)$

According to the above formula , If 5 A villager , The error rate of each villager is 0.33 , Then the error rate of voting is 0.749 ; If $20$ A villager , The error rate of each villager is 0.33 , Then the error rate of voting is 0.315 ; If 50 A villager , The error rate of each villager is 0.33 , Then the error rate of voting is 0.056 ; If 100 A villager , The error rate of each villager is 0.33 , Then the error rate of voting is 0.003. It can be seen from the results , The larger the number of villagers , Then the smaller the error rate after voting . This is also Bagging One of the reasons for strong performance .

Bagging Methods how to train

Bagging The characteristic of training is random put back sampling and parallel .

Randomly put back sampling ： Suppose the training data set has m Bar sample data , Every time from here m Randomly take a piece of data from the pieces of data and put it into the sampling set , Then return it to , So that the next sampling can still be sampled . Then repeat m Time , You can have m A sample set of data , The sample set is used as Bagging One of the many classifiers is used as the training data set . Suppose there is T A classifier （ Whatever classifier ）, Then repeat T This random sample is put back , build T Sample sets are used as T A training data set of classifiers .

parallel ： Suppose there is 10 A classifier , stay Boosting in ,1 No. 1 classifier training can be started only after it is completed 2 No. classifier training , And in the Bagging in , The classifier can be trained at the same time , When all classifier training is completed , Whole Bagging The training process is over .

Bagging The training process is shown in the figure below ：

Insert picture description here

Bagging How to predict the method

Bagging It's very simple to predict , Is to vote ！ For example, there is now 5 A classifier , Yes 3 A classifier thinks that the current sample belongs to A class ,1 A classifier is considered to belong to B class ,1 A classifier is considered to belong to C class , that Bagging The result will be A class （ because A Class has the highest number of votes ）.

Bagging The prediction process is shown in the figure below :

Insert picture description here

Random forests

Random forest is Bagging An extended variant of , The training process of random forest is relative to Bagging The changes in the training process are ：

Basic learner ：Bagging The base learner of can be any learner , The random forest is based on decision tree .
Random attribute selection ： Suppose the original training data set has 10 Features , From here 10 Random selection of features k Features form a subset of training data , Then this subset is thrown to the decision tree as a training set for training . among k The value of is generally log2 ( Number of features ).

Such changes usually make the random forest more generalized , Because the training data set of each decision tree is random , Moreover, the features in the training data set are also randomly extracted . If the difference of each decision tree model is large , Then it is easy to solve the problem that the decision tree is easy to over fit .

原网站

版权声明
本文为[alw_ one hundred and twenty-three]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202150503230965.html