当前位置:网站首页>RF, gbdt, xgboost feature selection methods "recommended collection"
RF, gbdt, xgboost feature selection methods "recommended collection"
2022-07-25 20:04:00 【Full stack programmer webmaster】
Hello everyone , I meet you again , I'm your friend, Quan Jun .
RF、GBDT、XGboost Can do feature selection , It belongs to the embedded method in feature selection . For example sklearn in , You can use attributes feature_importances_ To see the importance of features , such as :
from sklearn import ensemble
#grd = ensemble.GradientBoostingClassifier(n_estimators=30)
grd = ensemble.RandomForestClassifier(n_estimators=30)
grd.fit(X_train,y_train)
grd.feature_importances_But how do these three classifiers calculate the importance of features ? Let's explain it separately .
1. Random forests (Random Forest)
Use data outside the bag (OOB) Make predictions . Random forest in each re sampling to establish decision tree , There will be some samples that are not selected , Then these samples can be used for cross validation , This is also one of the advantages of random forest . It can avoid cross validation , Direct use oob _score_ To evaluate the performance of the model .
The specific method is :
1. For every decision tree , use OOB Calculate the data error outside the bag , Write it down as errOOB1;
2. Then randomly pair OOB The characteristics of all samples i Add noise interference , Calculate the data error outside the bag again , Write it down as errOOB2;
3. Suppose there is N tree , features i The importance of sum(errOOB2-errOOB1)/N;
If random noise is added , The accuracy of out of pocket data has decreased significantly , It shows that this feature has a great impact on the prediction results , Then it shows that its importance is relatively high
2. Gradient lifting tree (GBDT)
Mainly by calculating the characteristics i The average value of importance in a single tree , The calculation formula is as follows :
among ,M Is the number of trees . features i The importance of a single tree is mainly calculated according to this feature i The reduction in loss after splitting
among ,L Is the number of leaf nodes ,L-1 Is the number of non leaf nodes .
3. XGboost
XGboost It is calculated by the sum of the number of splits in each tree , For example, this feature splits in the first tree 1 Time , Second tree 2 Time ……, Then the score of this feature is (1+2+…).
Publisher : Full stack programmer stack length , Reprint please indicate the source :https://javaforall.cn/127541.html Link to the original text :https://javaforall.cn
边栏推荐
- [CSAPP Practice Problem 2.32] tsub_ OK (int x, int y) judge whether complement subtraction overflows
- 股票软件开发
- How to get started quickly in software testing
- Analysis of CMS station building system of common PHP in China
- [artifact] screenshot + mapping tool snipaste
- Binarysearch basic binary search
- 【高等数学】【4】不定积分
- CarSim仿真快速入门(十四)—CarSim-Simulink联合仿真
- qml 结合 QSqlTableModel 动态加载数据 MVC「建议收藏」
- Redis source code -ziplist
猜你喜欢

PMP采用最新考纲,这里有【敏捷项目管理】

sentinel简单限流和降级demo问题记录

03 isomorphism of tree 1

wallys//wifi6 wifi5 router IPQ6018 IPQ4019 IPQ4029 802.11ax 802.11ac

9.< tag-动态规划和子序列, 子数组>lt.718. 最长重复子数组 + lt.1143. 最长公共子序列

03-树1 树的同构

Timing analysis and constraints based on xlinx (1) -- what is timing analysis? What are temporal constraints? What is temporal convergence?

当AI邂逅生命健康,华为云为他们搭建三座桥

Notes - record a cannotfinddatasourceexception: dynamic datasource can not find primary datasource problem solving

Connecting to the database warning establishing SSL connection without server's identity verification is not recommended
随机推荐
Recommended system topic | Minet: cross domain CTR prediction
Legal mix of collations for operation 'Union' (bug record)
Error when creating dataset with mindscore
03 isomorphism of tree 1
什么是唯心主义
FormatDateTime说解[通俗易懂]
The query data returned by the print database is null or the default value. Does not match the value returned by the database
How does tiktok break zero?
Application of conductive slip ring in mechanical equipment
The JS paging plug-in supports tables, lists, text, and images
Timing analysis and constraints based on xlinx (1) -- what is timing analysis? What are temporal constraints? What is temporal convergence?
tiktok如何破零播放?
When the V100 of mindpole 8 card is trained to 101 epochs, an error of reading data timeout is reported
Mindspore1.1.1 source code compilation and installation -- errors in the core compilation stage
Selenium runs slowly - speed up by setting selenium load policy
A high efficiency 0-delay 0-copy QT player scheme based on Hisilicon 3559
Mutual conversion of camera internal parameter matrix K and FOV
RepVGG网络中重参化网络结构解读【附代码】
Proxy实现mysql读写分离
JVM(二十三) -- JVM运行时参数