当前位置:网站首页>Random forest and integration method learning notes
Random forest and integration method learning notes
2022-07-28 03:11:00 【Sheep Baa Baa Baa Baa Baa】
The voting classifier was mentioned in the last article ,bagging Method ,pasting Method , Machine learning methods such as random forest , For this kind of integration method, it can be called the integration method using the same weak learning model , This leads to a single model , And if the model is not suitable, the effect is not good , So the lifting method is introduced , Lifting method refers to the integration method of combining several weak learners into a strong learner .
The general idea is to cycle the training predictor , Every time, make some changes to the previous sequence .
Adaboost: It is to classify by changing the weight of instances with classification errors , Due to the changes in Quan Zhong , The model will prefer to select instances with larger weights , This is the cycle , Until the optimal situation is reached .
stay sklearn.AdaboostClassifier There are super parameters in algorithm, Used to adjust the algorithm , by SAMME when , It is a step-by-step addition model based on multi class exponential loss function , But for SAMME.R Based on probability .
##Adaboost
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
ada_clf =AdaBoostClassifier(DecisionTreeClassifier(max_depth=1),n_estimators=200,
algorithm='SAMME.R',learning_rate=0.5)
ada_clf.fit(x_train,y_train)Another integration method , Gradient rise . Similar to steepest descent . It is calculated by the residual value of the previous prediction result . The derivation process is as follows .
## Gradient rise
tree_reg1 =DecisionTreeClassifier(max_depth=2)
tree_reg1.fit(x,y)
y2 = y-tree_reg1.predict(x)
tree_reg2 =DecisionTreeClassifier(max_depth=2)
tree_reg2.fit(x,y)
## And so on , Until the error is less than the threshold ## A simple form of the above method
from sklearn.ensemble import GradientBoostingRegressor
gbrt =GradientBoostingRegressor(max_depth=2,n_estimators=3,learning_rate=1)
gbrt.fit(x,y)Again , Similar to the steepest descent method, there is a case of missing the minimum point , Until the end of the search, I found that the minimum point was in front , Designed the early stop method .
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
x_train,x_test,y_train,y_test = train_test_split(x,y)
gbrt =GradientBoostingRegressor(max_depth=2,n_estimators=120)
gbrt.fit(x_train,y_train)
errors = [mean_squared_error(y_test,y_pred) for y_pred in gbrt.staged_predict(x_test)]
best_estimators = np.argmin(errors)+1
gbrt_best = GradientBoostingRegressor(max_depth=2,n_estimators=best_estimators)
gbrt_best.fit(x_train,y_train)
gbrt = GradientBoostingRegressor(max_depth=2,warm_start=True)
min_test_error =float('inf')
error_going_up=0
for n_estimators in range(1,120):
gbrt.n_estimators =n_estimators
gbrt.fit(x_train,y_train)
y_pred =gbrt.predict(x_test)
test_error = mean_squared_error(y_pred,y_test)
if test_error <min_test_error:
min_test_error = test_error
error_going_up =0
else:
error_going_up +=1
if error_going_up==5:
break;The third method : Stacking method is also called cascade generalization method
First, divide the data set into test sets , Training set , Verification set .
Then train multiple predictors through the test set , Then test , When the effect is good , Bring validation sets into the model , Output predicted value . Then these predicted values are combined with the y, Train other models again , Finally, the test set is substituted to check the effect .
The fourth method :XGBOOST( Let's go on to the next one )
example :
(1) load MNIST Data sets , Split into test sets , Verification set , Training set , Training multiple classifiers , Then the voting classifier is used to compare the effect with multiple classifiers .
(2) Stack the above classifiers , Then compare with the voting classification , View integration effect
from sklearn.datasets import fetch_openml
minst = fetch_openml('mnist_784',version=1)
minst.keys()
x,y = minst['data'],minst['target']
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=20000)
x_test,x_val,y_test,y_val =train_test_split(x_test,y_test,test_size=0.5)
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.svm import SVC
from sklearn.ensemble import ExtraTreesClassifier
rf_clf = RandomForestClassifier()
svm = SVC(probability=True)
ex_clf=ExtraTreesClassifier()
voting_clf_hard = VotingClassifier(estimators=[('rf_clf',rf_clf),('svm',svm),('ex_clf',ex_clf)],voting='hard')
voting_clf_soft= VotingClassifier(estimators=[('rf_clf',rf_clf),('svm',svm),('ex_clf',ex_clf)],voting='soft')
from sklearn.metrics import accuracy_score
for model in (rf_clf,svm,ex_clf,voting_clf_hard,voting_clf_soft):
model.fit(x_train,y_train)
y_pred = model.predict(x_test)
print(model,accuracy_score(y_pred,y_test))y_pred_rf = rf_clf.predict(x_val)
y_pred_svm =svm.predict(x_val)
y_pred_ex =ex_clf.predict(x_val)
x_val_new = np.empty((len(x_val),3))
list=[y_pred_rf,y_pred_svm,y_pred_ex]
for index,value in enumerate(list):
x_val_new[:,index]=value
rf_clf_new =RandomForestClassifier(n_estimators=500,oob_score=True)
rf_clf_new.fit(x_val_new,y_val)
rf_clf_new.oob_score_
y_pred_rf = rf_clf.predict(x_test)
y_pred_svm =svm.predict(x_test)
y_pred_ex =ex_clf.predict(x_test)
x_test_new = np.empty((len(x_val),3))
list=[y_pred_rf,y_pred_svm,y_pred_ex]
for index,value in enumerate(list):
x_val_new[:,index]=value
accuracy_score(rf_clf_new.predict(x_test_new),y_test)边栏推荐
- MySQL索引学习
- Is it you who are not suitable for learning programming?
- 分布式 session 的4个解决方案,你觉得哪个最好?
- 【2022牛客多校2 K Link with Bracket Sequence I】括号线性dp
- [QNX hypervisor 2.2 user manual]9.10 pass
- 社恐适合什么工作?能做自媒体吗?
- Building of APP automation environment (I)
- Es6.--promise, task queue and event cycle
- What kind of job is social phobia suitable for? Can you do we media?
- 图像去噪综合比较研究
猜你喜欢

Opengauss Developer Day 2022 sincerely invites you to visit the "database kernel SQL Engine sub forum" of Yunhe enmo

Distributed transaction Senta (I)

综合 案例

The test post changes jobs repeatedly, jumping and jumping, and then it disappears

JS event object offsetx/y clientx y pagex y

Full of dry goods, hurry in!!! Easy to master functions in C language

【2022 牛客第二场J题 Link with Arithmetic Progression】三分套三分/三分极值/线性方程拟合最小二乘法

Kubernetes-----介绍

OA项目之我的审批(会议查询&会议签字)

一次跨域问题的记录
随机推荐
Data center construction (III): introduction to data center architecture
els 定时器
Day 8 of DL
Docker advanced -redis cluster configuration in docker container
Distributed transaction Senta (I)
Using pytorch's tensorboard visual deep learning indicators | pytorch series (25)
Pychart shortcut key for quickly modifying all the same names on the whole page
嵌入式开发:提示和技巧——用C进行防御性编程的最佳实践
分布式事务——Senta(一)
els 显示一个随机方块
[red team] att & CK - file hiding
How to authenticate Youxuan database client
@The function of valid (cascade verification) and the explanation of common constraint annotations
P6118 [joi 2019 final] solution to the problem of Zhenzhou City
数据中台建设(三):数据中台架构介绍
Consolidate the data foundation in the data center
tfx airflow 使用体验
MySQL essay
"29 years old, general function test, how do I get five offers in a week?"
数据湖:各模块组件