当前位置:网站首页>Decision tree and random forest
Decision tree and random forest
2022-06-26 17:51:00 【lmn_】

0x01 Decision tree Overview
Decision tree is a model for classification and regression , It is a supervised machine learning algorithm , It can be used for classification and regression problems . The tree answers successive questions , These questions enable us to follow a certain route of the tree with the answers given .
When building the decision tree , We know which variable and which value the variable uses to split the data , So as to quickly predict the results .

Advantages of decision tree
- Easy to interpret and visualize
- Internal operations can be observed , This makes replication possible
- Can quickly adapt to data sets
- Can handle numerical and categorical data
- have access to “ Trees ” Figure view and explain the final model in an orderly manner
- Good performance on large datasets
- Extremely fast
Disadvantages of decision tree
- Building a decision tree requires an algorithm that can determine the best choice for each node
- Decision trees are prone to over fitting , Especially when the tree is very deep
0x02 Random forest Overview
Forests have almost the same hyperparameters as decision trees , Generally speaking , A tree cannot get effective and desired results , At this time, we need to use the concept of random forest , Random forest is a kind of forest used for classification 、 Integrated learning methods for regression and other tasks .

Random forest can be understood as a group of decision trees , It is the aggregation of many decisions into one result , By constructing a large number of decision trees during training , Is a tree based machine learning algorithm , It uses the power of multiple decision trees to make decisions .
When building the random forest algorithm model , We have to define how many trees to make and how many variables are required for each node .
1995 year , Tin Kam Ho The first random decision forest algorithm is created by using the random subspace method , stay Ho In the formula of , This is a method to realize random discrimination ” Method of classification .
Methods of random forest variance reduction :
- Train different data samples
- Use random feature subsets

Random forest advantages
- Random decision forest corrects the over fitting of decision tree
- Random forests are usually better than decision trees , But they are less accurate than gradient lifting trees
- More trees will improve performance and make predictions more stable
Random forest disadvantages
- The random forest algorithm model is more complex , Because it is a combination of decision trees
- More trees will slow down the computation
0x03 The difference between decision tree and random forest
The key difference between random forest algorithm and decision tree is , A decision tree is a graph that uses a branching method to illustrate all possible outcomes of a decision . by comparison , The output of the random forest algorithm is a set of decision trees that work according to the output .
Decision tree is relative to decision forest , The model is better built , For random forests , The visualization of the final model is poor , If the amount of data is too large or there is no appropriate processing method to process the data , It will take a long time to create .
There is always a space of over fitting in the decision tree ; Random forest algorithm avoids and prevents over fitting by using multiple trees .
Decision trees require low computation , Thus, the implementation time is reduced and the precision is low ; Random forests consume more computation . The process of generation and analysis is very time-consuming .
Decision trees can be easily visualized ; Random forest visualization is complex .
0x04 Build
Pruning is further chopping these branches . It serves as a classification to subsidize data in a better way . Just as we say the way to trim the excess parts , It works on the same principle .
Reach leaf node , Trim end . It is a very important part of the decision tree .
0x05 summary
Compared to random forests , The decision tree is very easy . The decision tree combines some decisions , The random forest combines several decision trees .
Decision trees are fast and easy to operate on large datasets . Stochastic forest models require rigorous training , A lot of random forests , More time .

边栏推荐
- MySQL add column failed because there was data before, not null by default
- [ten thousand words summary] starting from the end, analyze in detail how to fill in the college entrance examination volunteers
- 物联网协议的王者:MQTT
- Today, I met a "migrant worker" who took out 38K from Tencent, which let me see the ceiling of the foundation
- Daily record 2
- 有依赖的背包问题
- 背包问题求方案数
- Concurrent thread safety
- padding百分比操作
- [buuctf.reverse] 126-130
猜你喜欢

Vscode usage - Remote SSH configuration description

MySql 导出数据库中的全部表索引

Rich professional product lines, and Jiangling Ford Lingrui · Jijing version is listed

halcon之区域:多种区域(Region)特征(5)

【代码随想录-动态规划】T583、两个字符串的删除操作

Jouer avec Linux et installer et configurer MySQL facilement

Viewing the task arrangement ability of monorepo tool from turborepo

分布式缓存/缓存集群简介

Treasure and niche CTA animation material website sharing

LM06丨仅用成交量构造抄底摸顶策略的奥秘
随机推荐
正则匹配相同字符
pycharm的plt.show()如何保持不关闭
分布式架构概述
js强制转换
【NPOI】C#跨工作薄复制Sheet模板导出Excel
LM06丨仅用成交量构造抄底摸顶策略的奥秘
Dos et détails de la méthode d'attaque
手写promise.all
MySQL add column failed because there was data before, not null by default
Platform management background and merchant menu resource management: Design of platform management background data service
一起备战蓝桥杯与CCF-CSP之大模拟炉石传说
如何将应用加入到deviceidle 白名单?
Preparing for the Blue Bridge Cup and ccf-csp
【QNX】命令
14《MySQL 教程》INSERT 插入数据
非对称密码体制详解
二分查找-2
[buuctf.reverse] 126-130
wechat_ Solve the problem of page Jump and parameter transfer by navigator in wechat applet
Halcon's region: features of multiple regions (5)