当前位置:网站首页>Random forest and integration method learning notes
Random forest and integration method learning notes
2022-07-28 03:11:00 【Sheep Baa Baa Baa Baa Baa】
The voting classifier was mentioned in the last article ,bagging Method ,pasting Method , Machine learning methods such as random forest , For this kind of integration method, it can be called the integration method using the same weak learning model , This leads to a single model , And if the model is not suitable, the effect is not good , So the lifting method is introduced , Lifting method refers to the integration method of combining several weak learners into a strong learner .
The general idea is to cycle the training predictor , Every time, make some changes to the previous sequence .
Adaboost: It is to classify by changing the weight of instances with classification errors , Due to the changes in Quan Zhong , The model will prefer to select instances with larger weights , This is the cycle , Until the optimal situation is reached .
stay sklearn.AdaboostClassifier There are super parameters in algorithm, Used to adjust the algorithm , by SAMME when , It is a step-by-step addition model based on multi class exponential loss function , But for SAMME.R Based on probability .
##Adaboost
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
ada_clf =AdaBoostClassifier(DecisionTreeClassifier(max_depth=1),n_estimators=200,
algorithm='SAMME.R',learning_rate=0.5)
ada_clf.fit(x_train,y_train)Another integration method , Gradient rise . Similar to steepest descent . It is calculated by the residual value of the previous prediction result . The derivation process is as follows .
## Gradient rise
tree_reg1 =DecisionTreeClassifier(max_depth=2)
tree_reg1.fit(x,y)
y2 = y-tree_reg1.predict(x)
tree_reg2 =DecisionTreeClassifier(max_depth=2)
tree_reg2.fit(x,y)
## And so on , Until the error is less than the threshold ## A simple form of the above method
from sklearn.ensemble import GradientBoostingRegressor
gbrt =GradientBoostingRegressor(max_depth=2,n_estimators=3,learning_rate=1)
gbrt.fit(x,y)Again , Similar to the steepest descent method, there is a case of missing the minimum point , Until the end of the search, I found that the minimum point was in front , Designed the early stop method .
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
x_train,x_test,y_train,y_test = train_test_split(x,y)
gbrt =GradientBoostingRegressor(max_depth=2,n_estimators=120)
gbrt.fit(x_train,y_train)
errors = [mean_squared_error(y_test,y_pred) for y_pred in gbrt.staged_predict(x_test)]
best_estimators = np.argmin(errors)+1
gbrt_best = GradientBoostingRegressor(max_depth=2,n_estimators=best_estimators)
gbrt_best.fit(x_train,y_train)
gbrt = GradientBoostingRegressor(max_depth=2,warm_start=True)
min_test_error =float('inf')
error_going_up=0
for n_estimators in range(1,120):
gbrt.n_estimators =n_estimators
gbrt.fit(x_train,y_train)
y_pred =gbrt.predict(x_test)
test_error = mean_squared_error(y_pred,y_test)
if test_error <min_test_error:
min_test_error = test_error
error_going_up =0
else:
error_going_up +=1
if error_going_up==5:
break;The third method : Stacking method is also called cascade generalization method
First, divide the data set into test sets , Training set , Verification set .
Then train multiple predictors through the test set , Then test , When the effect is good , Bring validation sets into the model , Output predicted value . Then these predicted values are combined with the y, Train other models again , Finally, the test set is substituted to check the effect .
The fourth method :XGBOOST( Let's go on to the next one )
example :
(1) load MNIST Data sets , Split into test sets , Verification set , Training set , Training multiple classifiers , Then the voting classifier is used to compare the effect with multiple classifiers .
(2) Stack the above classifiers , Then compare with the voting classification , View integration effect
from sklearn.datasets import fetch_openml
minst = fetch_openml('mnist_784',version=1)
minst.keys()
x,y = minst['data'],minst['target']
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=20000)
x_test,x_val,y_test,y_val =train_test_split(x_test,y_test,test_size=0.5)
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.svm import SVC
from sklearn.ensemble import ExtraTreesClassifier
rf_clf = RandomForestClassifier()
svm = SVC(probability=True)
ex_clf=ExtraTreesClassifier()
voting_clf_hard = VotingClassifier(estimators=[('rf_clf',rf_clf),('svm',svm),('ex_clf',ex_clf)],voting='hard')
voting_clf_soft= VotingClassifier(estimators=[('rf_clf',rf_clf),('svm',svm),('ex_clf',ex_clf)],voting='soft')
from sklearn.metrics import accuracy_score
for model in (rf_clf,svm,ex_clf,voting_clf_hard,voting_clf_soft):
model.fit(x_train,y_train)
y_pred = model.predict(x_test)
print(model,accuracy_score(y_pred,y_test))y_pred_rf = rf_clf.predict(x_val)
y_pred_svm =svm.predict(x_val)
y_pred_ex =ex_clf.predict(x_val)
x_val_new = np.empty((len(x_val),3))
list=[y_pred_rf,y_pred_svm,y_pred_ex]
for index,value in enumerate(list):
x_val_new[:,index]=value
rf_clf_new =RandomForestClassifier(n_estimators=500,oob_score=True)
rf_clf_new.fit(x_val_new,y_val)
rf_clf_new.oob_score_
y_pred_rf = rf_clf.predict(x_test)
y_pred_svm =svm.predict(x_test)
y_pred_ex =ex_clf.predict(x_test)
x_test_new = np.empty((len(x_val),3))
list=[y_pred_rf,y_pred_svm,y_pred_ex]
for index,value in enumerate(list):
x_val_new[:,index]=value
accuracy_score(rf_clf_new.predict(x_test_new),y_test)边栏推荐
- 别再用 offset 和 limit 分页了,性能太差!
- Pytorch 相关-梯度回传
- 嵌入式开发:提示和技巧——用C进行防御性编程的最佳实践
- 使用PyTorch的TensorBoard-可视化深度学习指标 | PyTorch系列(二十五)
- Pytest the best testing framework
- 数据湖:海量日志采集引擎Flume
- Where do I go to open an account for stock speculation? Is it safe to open an account on my mobile phone
- Interview experience: first tier cities move bricks and face software testing posts. 5000 is enough
- clientY vs pageY
- ROS的调试经验
猜你喜欢

【2022 牛客第二场J题 Link with Arithmetic Progression】三分套三分/三分极值/线性方程拟合最小二乘法

满满干货赶紧进来!!!轻松掌握C语言中的函数

Unexpected harvest of epic distributed resources, from basic to advanced are full of dry goods, big guys are strong!

分布式 session 的4个解决方案,你觉得哪个最好?

Docker高级篇-Docker容器内Redis集群配置

The test post changes jobs repeatedly, jumping and jumping, and then it disappears

My approval & signature function of conference OA project

【stream】并行流与顺序流

CNN训练循环重构——超参数测试 | PyTorch系列(二十八)

嵌入式开发:提示和技巧——用C进行防御性编程的最佳实践
随机推荐
MySQL索引学习
Introduction to the reduce() function in JS
JVM 内存布局详解,图文并茂,写得太好了!
上位机与MES对接的几种方式
汇总了50多场面试,4-6月面经笔记和详解(含核心考点及6家大厂)
Actual case of ROS communication
蓝桥杯:第九届—“彩灯控制器”
没法预测明天的涨跌
"29 years old, general function test, how do I get five offers in a week?"
Games101 review: ray tracing
写英文IEEE论文的技巧
My approval & signature function of conference OA project
Es6.--promise, task queue and event cycle
JS event object offsetx/y clientx y pagex y
Why is it that when logging in, you clearly use the account information already in the database, but still display "user does not exist"?
谈一谈百度 科大讯飞 云知声的语音合成功能
Trivy [1] tool scanning application
trivy【1】工具扫描运用
@Valid的作用(级联校验)以及常用约束注解的解释说明
Data Lake: each module component