当前位置:网站首页>Chapter 5 decision tree and random forest practice
Chapter 5 decision tree and random forest practice
2022-07-27 03:58:00 【Sang zhiweiluo 0208】
1 Over fitting problem of decision tree
1.1 Problem description
Decision tree has good classification ability for training , But the unknown test data may not have good classification ability , Weak generalization ability , That is, fitting phenomenon may have occurred .
1.2 resolvent
(1) prune 


(2) Reasonable and effective sampling
bagging:

OOB data

Random forests

- Random forests /Bagging Relationship with decision tree
Decision tree is the basic classifier ;
SVM、Logistic Regression and other classifiers “ Total classifier ”, It's called random forest .
give an example : The return question
2 Return to
2.1 The algorithm process
do 100 Time bootstrap, Every time I get data Di(Di The length of is N). For each Di, Use local regression (LOESS) Fit a curve . Then average these curves , Get the final fitting curve , The curve over fitting is weakened .
2.2 give an example
vote :(1) Simple voting mechanism : One vote against 、 The minority is subordinate to the majority 、 Threshold voting (2) Bayesian voting mechanism
Movie reviews : bring
As big as possible .
3 The use of random forests
3.1 Use random forest to calculate the similarity between samples
principle : If two samples appear at the same leaf node at the same time, the more times , The more similar the two .
The algorithm process : Record the number of samples as N, initialization NXN The zero matrices of S,S[i,j] Presentation sample i and j The similarity . about m A random forest formed by a decision tree , Traverse all leaf nodes of all decision trees ( sample i,j Appear at the same node , be s[i,j] Add 1). End of traversal ,S Is the similarity matrix between samples .
3.2 Use random forests to calculate the importance of features
(1) Calculate the node through which the positive example passes , Use the number of passing nodes 、gini Coefficient and other indicators to judge the importance of characteristics .
(2) Randomly replace a column of data , Rebuild the decision tree , Calculate the change of the accuracy of the new model to judge the importance of the characteristics of this column .
3.3 Isolated forests
Isolated forests (Isolation Forest) Detect outliers by isolating sample points .
features 、 The dividing points are randomly selected , Then generate a certain depth of decision tree iTree, Several trees iTree form iForest.
To calculate iTree The length of the sample from root to leaf f(x), And then calculate iForest in f(x) The sum of F(x).
Test standard :F(x) Smaller samples x Is an outlier .
summary
Decision tree / The code of random forest is clear 、 The logic is simple , While being competent for classification problems , It can also be used as the primary algorithm to explore data distribution .
The integration idea of random forest can also be used in the design of other classifiers .
边栏推荐
- 小于等于K的最大子数组累加和
- C # using sqlsugar updatable system to report invalid numbers, how to solve it? Ask for guidance!
- 一文读懂 | 数据中台如何支撑企业数字化经营
- 注释有点好玩哦
- Implementation of API short message gateway based on golang
- Process analysis of object creation
- Chapter 5 决策树和随机森林实践
- Realization of regular hexagon map with two-dimensional array of unity
- 关于使用hyperbeach出现/bin/sh: 1: packr2: not found的解决方案
- LPCI-252通用型PCI接口CAN卡的功能和应用介绍
猜你喜欢

Learning and understanding of four special data types of redis

Okaleido tiger is about to log in to binance NFT in the second round, which has aroused heated discussion in the community

复盘:图像有哪些基本属性?关于图像的知识你知道哪些?图像的参数有哪些

关于使用hyperbeach出现/bin/sh: 1: packr2: not found的解决方案

Worthington papain dissociation system solution

Kettle读取按行分割的文件

Characteristics and determination scheme of Worthington pectinase

Use websocket to realize a web version of chat room (fishing is more hidden)

Programming implementation of eight queens

477-82(236、61、47、74、240、93)
随机推荐
The function and application of lpci-252 universal PCI interface can card
Interview question: the difference between three instantiated objects in string class
03.获取网页源代码
榕树贷款C语言结构体里的成员数组和指针
flinkSQLclient创建的job,flink重启就没了,有什么办法吗?
flink cdc 到MySQL8没问题,到MySQL5读有问题,怎么办?
Binary tree (Beijing University of Posts and Telecommunications machine test questions) (day85)
easyui中textbox在光标位置插入内容
Contour detection based on OpenCV (2)
第六周复习
Will this flinkcdc monitor all tables in the database? Or the designated table? I look at the background log. It monitors all tables. If it monitors
Number of square arrays (day 81)
数字孪生实际应用:智慧城市项目建设解决方案
app端接口用例设计方法和测试方法
ZJCTF_ login
"Date: write error: no space left on device" solution
Characteristics and determination scheme of Worthington pectinase
Program to change the priority of the process in LabVIEW
C# 使用SqlSugar Updateable系统报错无效数字,如何解决?求指导!
Minimum ticket price (day 80)