当前位置:网站首页>Random forest and integration method learning notes
Random forest and integration method learning notes
2022-07-28 03:11:00 【Sheep Baa Baa Baa Baa Baa】
The voting classifier was mentioned in the last article ,bagging Method ,pasting Method , Machine learning methods such as random forest , For this kind of integration method, it can be called the integration method using the same weak learning model , This leads to a single model , And if the model is not suitable, the effect is not good , So the lifting method is introduced , Lifting method refers to the integration method of combining several weak learners into a strong learner .
The general idea is to cycle the training predictor , Every time, make some changes to the previous sequence .
Adaboost: It is to classify by changing the weight of instances with classification errors , Due to the changes in Quan Zhong , The model will prefer to select instances with larger weights , This is the cycle , Until the optimal situation is reached .
stay sklearn.AdaboostClassifier There are super parameters in algorithm, Used to adjust the algorithm , by SAMME when , It is a step-by-step addition model based on multi class exponential loss function , But for SAMME.R Based on probability .
##Adaboost
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
ada_clf =AdaBoostClassifier(DecisionTreeClassifier(max_depth=1),n_estimators=200,
algorithm='SAMME.R',learning_rate=0.5)
ada_clf.fit(x_train,y_train)Another integration method , Gradient rise . Similar to steepest descent . It is calculated by the residual value of the previous prediction result . The derivation process is as follows .
## Gradient rise
tree_reg1 =DecisionTreeClassifier(max_depth=2)
tree_reg1.fit(x,y)
y2 = y-tree_reg1.predict(x)
tree_reg2 =DecisionTreeClassifier(max_depth=2)
tree_reg2.fit(x,y)
## And so on , Until the error is less than the threshold ## A simple form of the above method
from sklearn.ensemble import GradientBoostingRegressor
gbrt =GradientBoostingRegressor(max_depth=2,n_estimators=3,learning_rate=1)
gbrt.fit(x,y)Again , Similar to the steepest descent method, there is a case of missing the minimum point , Until the end of the search, I found that the minimum point was in front , Designed the early stop method .
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
x_train,x_test,y_train,y_test = train_test_split(x,y)
gbrt =GradientBoostingRegressor(max_depth=2,n_estimators=120)
gbrt.fit(x_train,y_train)
errors = [mean_squared_error(y_test,y_pred) for y_pred in gbrt.staged_predict(x_test)]
best_estimators = np.argmin(errors)+1
gbrt_best = GradientBoostingRegressor(max_depth=2,n_estimators=best_estimators)
gbrt_best.fit(x_train,y_train)
gbrt = GradientBoostingRegressor(max_depth=2,warm_start=True)
min_test_error =float('inf')
error_going_up=0
for n_estimators in range(1,120):
gbrt.n_estimators =n_estimators
gbrt.fit(x_train,y_train)
y_pred =gbrt.predict(x_test)
test_error = mean_squared_error(y_pred,y_test)
if test_error <min_test_error:
min_test_error = test_error
error_going_up =0
else:
error_going_up +=1
if error_going_up==5:
break;The third method : Stacking method is also called cascade generalization method
First, divide the data set into test sets , Training set , Verification set .
Then train multiple predictors through the test set , Then test , When the effect is good , Bring validation sets into the model , Output predicted value . Then these predicted values are combined with the y, Train other models again , Finally, the test set is substituted to check the effect .
The fourth method :XGBOOST( Let's go on to the next one )
example :
(1) load MNIST Data sets , Split into test sets , Verification set , Training set , Training multiple classifiers , Then the voting classifier is used to compare the effect with multiple classifiers .
(2) Stack the above classifiers , Then compare with the voting classification , View integration effect
from sklearn.datasets import fetch_openml
minst = fetch_openml('mnist_784',version=1)
minst.keys()
x,y = minst['data'],minst['target']
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=20000)
x_test,x_val,y_test,y_val =train_test_split(x_test,y_test,test_size=0.5)
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.svm import SVC
from sklearn.ensemble import ExtraTreesClassifier
rf_clf = RandomForestClassifier()
svm = SVC(probability=True)
ex_clf=ExtraTreesClassifier()
voting_clf_hard = VotingClassifier(estimators=[('rf_clf',rf_clf),('svm',svm),('ex_clf',ex_clf)],voting='hard')
voting_clf_soft= VotingClassifier(estimators=[('rf_clf',rf_clf),('svm',svm),('ex_clf',ex_clf)],voting='soft')
from sklearn.metrics import accuracy_score
for model in (rf_clf,svm,ex_clf,voting_clf_hard,voting_clf_soft):
model.fit(x_train,y_train)
y_pred = model.predict(x_test)
print(model,accuracy_score(y_pred,y_test))y_pred_rf = rf_clf.predict(x_val)
y_pred_svm =svm.predict(x_val)
y_pred_ex =ex_clf.predict(x_val)
x_val_new = np.empty((len(x_val),3))
list=[y_pred_rf,y_pred_svm,y_pred_ex]
for index,value in enumerate(list):
x_val_new[:,index]=value
rf_clf_new =RandomForestClassifier(n_estimators=500,oob_score=True)
rf_clf_new.fit(x_val_new,y_val)
rf_clf_new.oob_score_
y_pred_rf = rf_clf.predict(x_test)
y_pred_svm =svm.predict(x_test)
y_pred_ex =ex_clf.predict(x_test)
x_test_new = np.empty((len(x_val),3))
list=[y_pred_rf,y_pred_svm,y_pred_ex]
for index,value in enumerate(list):
x_val_new[:,index]=value
accuracy_score(rf_clf_new.predict(x_test_new),y_test)边栏推荐
- 会议OA项目之我的审批&&签字功能
- “29岁,普通功能测试,我是如何在一周内拿到5份Offer的?”
- Explanation of CNN circular training | pytorch series (XXII)
- 【AcWing 327. 玉米田】状压dp
- MySQL essay
- Redis群集
- The applet has obtained the total records and user locations in the database collection. How to use aggregate.geonear to arrange the longitude and latitude from near to far?
- 综合 案例
- Vscode debug displays multiple columns of data
- Docker advanced -redis cluster configuration in docker container
猜你喜欢

Games101 review: ray tracing

线程基础

Center Based 3D object detection and tracking (centerpoint) paper notes
[email protected] Annotation usage"/>[email protected] Annotation usage

综合 案例

Flutter God operation learning (full level introduction)

汇总了50多场面试,4-6月面经笔记和详解(含核心考点及6家大厂)

蓝桥杯原题

Commissioning experience of ROS

Web服务器
随机推荐
别再用 offset 和 limit 分页了,性能太差!
Collision and rebound of objects in unity (learning)
Niuke-top101-bm340
The applet has obtained the total records and user locations in the database collection. How to use aggregate.geonear to arrange the longitude and latitude from near to far?
四、固态硬盘存储技术的分析(论文)
Skills in writing English IEEE papers
[wechat applet development (V)] the interface is intelligently configured according to the official version of the experience version of the development version
分布式 session 的4个解决方案,你觉得哪个最好?
数据中台建设(三):数据中台架构介绍
【stream】stream流基础知识
CSDN Top1 "how does a Virgo procedural ape" become a blogger with millions of fans through writing?
Record of a cross domain problem
42.js -- precompiled
ORACLE BASICFILE LOB字段空间回收SHRINK SPACE的疑惑
trivy【1】工具扫描运用
Why is it that when logging in, you clearly use the account information already in the database, but still display "user does not exist"?
优炫数据库客户端如何认证
TFX airflow experience
Day 8 of DL
机器人工程是否有红利期