当前位置：网站首页>Decision tree and random forest

Decision tree and random forest

2022-06-26 17:51:00 【lmn_】

0x01 Decision tree Overview

Decision tree is a model for classification and regression , It is a supervised machine learning algorithm , It can be used for classification and regression problems . The tree answers successive questions , These questions enable us to follow a certain route of the tree with the answers given .

When building the decision tree , We know which variable and which value the variable uses to split the data , So as to quickly predict the results .

Advantages of decision tree

Easy to interpret and visualize
Internal operations can be observed , This makes replication possible
Can quickly adapt to data sets
Can handle numerical and categorical data
have access to “ Trees ” Figure view and explain the final model in an orderly manner
Good performance on large datasets
Extremely fast

Disadvantages of decision tree

Building a decision tree requires an algorithm that can determine the best choice for each node
Decision trees are prone to over fitting , Especially when the tree is very deep

0x02 Random forest Overview

Forests have almost the same hyperparameters as decision trees , Generally speaking , A tree cannot get effective and desired results , At this time, we need to use the concept of random forest , Random forest is a kind of forest used for classification 、 Integrated learning methods for regression and other tasks .

Random forest can be understood as a group of decision trees , It is the aggregation of many decisions into one result , By constructing a large number of decision trees during training , Is a tree based machine learning algorithm , It uses the power of multiple decision trees to make decisions .

When building the random forest algorithm model , We have to define how many trees to make and how many variables are required for each node .

1995 year , Tin Kam Ho The first random decision forest algorithm is created by using the random subspace method , stay Ho In the formula of , This is a method to realize random discrimination ” Method of classification .

Methods of random forest variance reduction ：

Train different data samples
Use random feature subsets

Random forest advantages

Random decision forest corrects the over fitting of decision tree
Random forests are usually better than decision trees , But they are less accurate than gradient lifting trees
More trees will improve performance and make predictions more stable

Random forest disadvantages

The random forest algorithm model is more complex , Because it is a combination of decision trees
More trees will slow down the computation

0x03 The difference between decision tree and random forest

The key difference between random forest algorithm and decision tree is , A decision tree is a graph that uses a branching method to illustrate all possible outcomes of a decision . by comparison , The output of the random forest algorithm is a set of decision trees that work according to the output .

Decision tree is relative to decision forest , The model is better built , For random forests , The visualization of the final model is poor , If the amount of data is too large or there is no appropriate processing method to process the data , It will take a long time to create .

There is always a space of over fitting in the decision tree ; Random forest algorithm avoids and prevents over fitting by using multiple trees .
Decision trees require low computation , Thus, the implementation time is reduced and the precision is low ; Random forests consume more computation . The process of generation and analysis is very time-consuming .

Decision trees can be easily visualized ; Random forest visualization is complex .

0x04 Build

Pruning is further chopping these branches . It serves as a classification to subsidize data in a better way . Just as we say the way to trim the excess parts , It works on the same principle .
Reach leaf node , Trim end . It is a very important part of the decision tree .

0x05 summary

Compared to random forests , The decision tree is very easy . The decision tree combines some decisions , The random forest combines several decision trees .
Decision trees are fast and easy to operate on large datasets . Stochastic forest models require rigorous training , A lot of random forests , More time .