当前位置:网站首页>Mpai data science platform random forest classification \ explanation of regression parameter adjustment
Mpai data science platform random forest classification \ explanation of regression parameter adjustment
2022-06-25 12:05:00 【Halosec_ Wei】
Number of decision trees (n_estimators):
This is the number of trees in the forest , That is, the number of base evaluators . The effect of this parameter on the accuracy of the stochastic forest model is monotonic , The larger the number of decision trees , Models tend to work better . But the corresponding , Any model has a decision boundary after the number of decision trees reaches a certain degree , The accuracy of random forests is often not rising or beginning to fluctuate , also , The larger the number of decision trees , The larger the amount of computation and memory required , The training time will be longer and longer . For this parameter , We are eager to strike a balance between the difficulty of training and the effect of the model , The number of decision trees is usually no more than 1000.
Value :【1,+∞】
The principle of division (criterion):
Return to : Regression tree is an indicator of branch quality , Supported standards are 2 Kind of :MAE,MSE( The specific formula is self-contained );
classification :CART The evaluation criteria of tree division on features , Supported standards are 2 Kind of ,: gini index (Gini), Information gain (entropy);
Maximum depth of decision tree (max_depth):
The default value means that the decision tree will not limit the depth of the subtree when building the optimal model . If the sample size of the model is large , When there are many features , It is recommended to limit the maximum depth ; If the sample size is small or the characteristics are small , The maximum depth is not limited ,max depth Usually no more than 50.
Value :【1,+∞】
Splitting an internal node requires a small number of samples (min_samples_split):
Integer or floating point , The default is 2. It specifies to split an internal node ( Nonleaf node ) Minimum number of samples required . This value limits the conditions for the continued division of the subtree , If the number of samples of a node is less than min_samples_split, Then we will not continue to try to select the best feature for classification . The default is 2. If the sample size is small , You don't need to worry about this value . If the sample size is very large , It is recommended to increase this value
Value :【2,+∞】
The minimum number of samples required for each leaf node (min_samples_leaf):
This value limits the minimum number of samples for leaf nodes , If the number of leaf nodes is less than the number of samples , Will be pruned together with brother nodes . The default is 1, An integer that can enter the minimum number of samples , Or the minimum number of samples as a percentage of the total number of samples . If the sample size is small , You don't need to worry about this value . If the sample size is very large , It is recommended to increase this value .
Value :【1,+∞】
The number of features to consider when searching for the optimal partition of nodes (max_features):
When selecting the optimal attribute, the divided characteristics cannot exceed this value , When it is an integer , That is, the maximum characteristic number ; When decimal , Number of training set features * decimal ; auto when max_features=sqrt(n_features).
Value :(0,1】
Maximum number of leaf nodes (max_leaf_nodes):
By limiting the maximum number of leaf nodes , Can prevent over fitting , The default is "None”, That is, the maximum number of leaf nodes is not limited . If there are restrictions , The algorithm will establish the optimal decision tree within the maximum number of leaf nodes . If there are not many features , This value can be ignored , But if the features are divided into many parts , Can be limited , Specific values can be obtained through cross validation
Value :(0,1】
Information entropy or Gini coefficient impurity threshold (min_impurity_split):
This value limits the growth of the decision tree , If the impurity of a node ( Based on Gini coefficient , Mean square error ) Less than this threshold , Then the node is not regenerated to a child node . Leaf node . It is generally not recommended to change the default value 1e-7.
Value :(0,1】
There is a sample put back (bootstrap:)
seeing the name of a thing one thinks of its function , That is to say, whether there is a sampling of the land to be put back when building a decision tree for a random forest , The default is True, That is to say, the strategy of "put back sampling" is adopted
Value : Yes 、 nothing
Out of bag estimation (oob_score):,
bagging The random sampling method is adopted to establish the tree model , So those sample sets that have not been extracted , That is, the data set that is not involved in establishing the tree model is the data set outside the bag , This data set can be used to verify the effect of the model , Parameter training of multiple models , We know that cross validation can be used to , But it takes a lot of time , And there is no great need for random forest , So we use this data to verify the decision tree model , It's a simple cross validation . Low performance consumption , But the effect is good . The default value is False.
Value : Yes 、 nothing
边栏推荐
- 动态代理
- Black Horse Chang Shopping Mall - - - 3. Gestion des produits de base
- How terrible is it not to use error handling in VFP?
- Network related encapsulation introduced by webrtc native M96 basic base module
- Gradle知识点
- Simple use of stream (II)
- R语言dplyr包filter函数过滤dataframe数据中指定数据列的内容不是(不等于指定向量中的其中一个)指定列表中的数据行
- ThingsPanel 發布物聯網手機客戶端(多圖)
- quarkus saas动态数据源切换实现,简单完美
- SMS verification before deleting JSP
猜你喜欢

PD1.4转HDMI2.0转接线拆解。

What are redis avalanche, penetration and breakdown?

Real software developers will use this method to predict the future

Thingspanel releases Internet of things mobile client (multiple pictures)

Manually rollback abnormal data

Deeply understand Flink SQL execution process based on flink1.12

How TCP handles exceptions during three handshakes and four waves

Recommend a virtual machine software available for M1 computer

按钮多次点击造成结果

黑马畅购商城---6.品牌、规格统计、条件筛选、分页排序、高亮显示
随机推荐
SDN系统方法 | 9. 接入网
Pd1.4 to hdmi2.0 adapter cable disassembly.
R语言caTools包进行数据划分、scale函数进行数据缩放、e1071包的naiveBayes函数构建朴素贝叶斯模型
Use PHP script to view the opened extensions
为什么ping不通网站 但是却可以访问该网站?
ThingsPanel 发布物联网手机客户端(多图)
Caused by: org. xml. sax. SAXParseException; lineNumber: 1; columnNumber: 10; Processing matching '[xx][mm][ll]' is not allowed
Why can't you Ping the website but you can access it?
Specific meanings of node and edge in Flink graph
Application of analytic hierarchy process in college teaching evaluation system (principle + example + tool)
Update of complex JSON in MySQL
The temporary table from XML to VFP is simple and easy to use and worth collecting
现在网上炒股开户身份证信息安全吗?
quarkus saas动态数据源切换实现,简单完美
Capacity expansion mechanism of Dict Of redis (rehash)
Windows11 MySQL service is missing
PyCaret 成功解决无法从‘sklearn.model_selection._search‘导入名称“_check_param_grid”
Is industrial securities a state-owned enterprise? Is it safe to open an account in industrial securities?
动态代理
Oracle Spatial creating spatial tables