当前位置:网站首页>RF, gbdt, xgboost feature selection methods "recommended collection"
RF, gbdt, xgboost feature selection methods "recommended collection"
2022-07-25 20:04:00 【Full stack programmer webmaster】
Hello everyone , I meet you again , I'm your friend, Quan Jun .
RF、GBDT、XGboost Can do feature selection , It belongs to the embedded method in feature selection . For example sklearn in , You can use attributes feature_importances_ To see the importance of features , such as :
from sklearn import ensemble
#grd = ensemble.GradientBoostingClassifier(n_estimators=30)
grd = ensemble.RandomForestClassifier(n_estimators=30)
grd.fit(X_train,y_train)
grd.feature_importances_But how do these three classifiers calculate the importance of features ? Let's explain it separately .
1. Random forests (Random Forest)
Use data outside the bag (OOB) Make predictions . Random forest in each re sampling to establish decision tree , There will be some samples that are not selected , Then these samples can be used for cross validation , This is also one of the advantages of random forest . It can avoid cross validation , Direct use oob _score_ To evaluate the performance of the model .
The specific method is :
1. For every decision tree , use OOB Calculate the data error outside the bag , Write it down as errOOB1;
2. Then randomly pair OOB The characteristics of all samples i Add noise interference , Calculate the data error outside the bag again , Write it down as errOOB2;
3. Suppose there is N tree , features i The importance of sum(errOOB2-errOOB1)/N;
If random noise is added , The accuracy of out of pocket data has decreased significantly , It shows that this feature has a great impact on the prediction results , Then it shows that its importance is relatively high
2. Gradient lifting tree (GBDT)
Mainly by calculating the characteristics i The average value of importance in a single tree , The calculation formula is as follows :
among ,M Is the number of trees . features i The importance of a single tree is mainly calculated according to this feature i The reduction in loss after splitting
among ,L Is the number of leaf nodes ,L-1 Is the number of non leaf nodes .
3. XGboost
XGboost It is calculated by the sum of the number of splits in each tree , For example, this feature splits in the first tree 1 Time , Second tree 2 Time ……, Then the score of this feature is (1+2+…).
Publisher : Full stack programmer stack length , Reprint please indicate the source :https://javaforall.cn/127541.html Link to the original text :https://javaforall.cn
边栏推荐
- Three skills of interface request merging, and the performance is directly exploded!
- Can you tell me whether mindspore supports torchvision Model directly uses the pre trained network, such as vgg16
- Advantages of network virtualization of various manufacturers
- How to ensure the quality of customized slip rings
- C merge set
- Share 25 useful JS single line codes
- Export and call of onnx file of pytorch model
- FormatDateTime说解[通俗易懂]
- 谷歌Pixel 6a屏下指纹扫描仪存在重大安全漏洞
- 919. 完全二叉树插入器
猜你喜欢

Security Basics 4 - regular expressions

PMP采用最新考纲,这里有【敏捷项目管理】

Recommendations on how to install plug-ins and baby plug-ins in idea

PreScan快速入门到精通第十九讲之PreScan执行器配置、轨迹同步及非配多个轨迹

Cloud native guide: what is cloud native infrastructure

919. Complete binary tree inserter

Concept of IP address

UNET and mask RCNN

The JS paging plug-in supports tables, lists, text, and images

Share 25 useful JS single line codes
随机推荐
CarSim simulation quick start (XIV) - CarSim Simulink joint simulation
UNET and mask RCNN
Three skills of interface request merging, and the performance is directly exploded!
PyTorch 模型 onnx 文件的导出和调用
VMware virtual machine download, installation and use tutorial
Partial interpretation of yolov7 paper [including my own understanding]
[Infographics Show] 248 Public Domain Name
Notes - record a cannotfinddatasourceexception: dynamic datasource can not find primary datasource problem solving
[CSAPP Practice Problem 2.32] tsub_ OK (int x, int y) judge whether complement subtraction overflows
JVM (XXIII) -- JVM runtime parameters
Yyds dry inventory how to locate browser page crash
网络爬虫原理解析「建议收藏」
[Infographics Show] 248 Public Domain Name
RF、GBDT、XGboost特征选择方法「建议收藏」
PMP采用最新考纲,这里有【敏捷项目管理】
各厂商网络虚拟化的优势
Security Basics 4 - regular expressions
飞行器pid控制(旋翼飞控)
From Tong Dai to "Tong Dai" and then to brand, the beauty of sudden profits has changed and remained unchanged
Oracle database download, installation, use tutorial and problem summary