当前位置：网站首页>Random forest and integration method learning notes

Random forest and integration method learning notes

2022-07-28 03:11:00 【Sheep Baa Baa Baa Baa Baa】

The voting classifier was mentioned in the last article ,bagging Method ,pasting Method , Machine learning methods such as random forest , For this kind of integration method, it can be called the integration method using the same weak learning model , This leads to a single model , And if the model is not suitable, the effect is not good , So the lifting method is introduced , Lifting method refers to the integration method of combining several weak learners into a strong learner .

The general idea is to cycle the training predictor , Every time, make some changes to the previous sequence .

Adaboost： It is to classify by changing the weight of instances with classification errors , Due to the changes in Quan Zhong , The model will prefer to select instances with larger weights , This is the cycle , Until the optimal situation is reached .

stay sklearn.AdaboostClassifier There are super parameters in algorithm, Used to adjust the algorithm , by SAMME when , It is a step-by-step addition model based on multi class exponential loss function , But for SAMME.R Based on probability .

##Adaboost
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
ada_clf =AdaBoostClassifier(DecisionTreeClassifier(max_depth=1),n_estimators=200,
                           algorithm='SAMME.R',learning_rate=0.5)
ada_clf.fit(x_train,y_train)

Another integration method , Gradient rise . Similar to steepest descent . It is calculated by the residual value of the previous prediction result . The derivation process is as follows .

## Gradient rise 
tree_reg1 =DecisionTreeClassifier(max_depth=2)
tree_reg1.fit(x,y)
y2 = y-tree_reg1.predict(x)
tree_reg2 =DecisionTreeClassifier(max_depth=2)
tree_reg2.fit(x,y)
## And so on , Until the error is less than the threshold

## A simple form of the above method 
from sklearn.ensemble import GradientBoostingRegressor
gbrt =GradientBoostingRegressor(max_depth=2,n_estimators=3,learning_rate=1)
gbrt.fit(x,y)

Again , Similar to the steepest descent method, there is a case of missing the minimum point , Until the end of the search, I found that the minimum point was in front , Designed the early stop method .

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
x_train,x_test,y_train,y_test = train_test_split(x,y)
gbrt =GradientBoostingRegressor(max_depth=2,n_estimators=120)
gbrt.fit(x_train,y_train)
errors = [mean_squared_error(y_test,y_pred) for y_pred in gbrt.staged_predict(x_test)]
best_estimators = np.argmin(errors)+1
gbrt_best = GradientBoostingRegressor(max_depth=2,n_estimators=best_estimators)
gbrt_best.fit(x_train,y_train)
gbrt = GradientBoostingRegressor(max_depth=2,warm_start=True)
min_test_error =float('inf')
error_going_up=0
for n_estimators in range(1,120):
    gbrt.n_estimators =n_estimators
    gbrt.fit(x_train,y_train)
    y_pred =gbrt.predict(x_test)
    test_error = mean_squared_error(y_pred,y_test)
    if test_error <min_test_error:
        min_test_error = test_error
        error_going_up =0
    else:
        error_going_up +=1
        if error_going_up==5:
            break;

The third method ： Stacking method is also called cascade generalization method

First, divide the data set into test sets , Training set , Verification set .

Then train multiple predictors through the test set , Then test , When the effect is good , Bring validation sets into the model , Output predicted value . Then these predicted values are combined with the y, Train other models again , Finally, the test set is substituted to check the effect .

The fourth method ：XGBOOST( Let's go on to the next one )

example ：

（1） load MNIST Data sets , Split into test sets , Verification set , Training set , Training multiple classifiers , Then the voting classifier is used to compare the effect with multiple classifiers .

（2） Stack the above classifiers , Then compare with the voting classification , View integration effect

from sklearn.datasets import fetch_openml
minst = fetch_openml('mnist_784',version=1)
minst.keys()
x,y = minst['data'],minst['target']
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=20000)
x_test,x_val,y_test,y_val =train_test_split(x_test,y_test,test_size=0.5)
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.svm import SVC
from sklearn.ensemble import ExtraTreesClassifier
rf_clf = RandomForestClassifier()
svm = SVC(probability=True)
ex_clf=ExtraTreesClassifier()
voting_clf_hard = VotingClassifier(estimators=[('rf_clf',rf_clf),('svm',svm),('ex_clf',ex_clf)],voting='hard')
voting_clf_soft= VotingClassifier(estimators=[('rf_clf',rf_clf),('svm',svm),('ex_clf',ex_clf)],voting='soft')
from sklearn.metrics import accuracy_score
for model in (rf_clf,svm,ex_clf,voting_clf_hard,voting_clf_soft):
    model.fit(x_train,y_train)
    y_pred = model.predict(x_test)
    print(model,accuracy_score(y_pred,y_test))

y_pred_rf = rf_clf.predict(x_val)
y_pred_svm =svm.predict(x_val)
y_pred_ex =ex_clf.predict(x_val)
x_val_new = np.empty((len(x_val),3))
list=[y_pred_rf,y_pred_svm,y_pred_ex]
for index,value in enumerate(list):
    x_val_new[:,index]=value
rf_clf_new =RandomForestClassifier(n_estimators=500,oob_score=True)
rf_clf_new.fit(x_val_new,y_val)
rf_clf_new.oob_score_
y_pred_rf = rf_clf.predict(x_test)
y_pred_svm =svm.predict(x_test)
y_pred_ex =ex_clf.predict(x_test)
x_test_new = np.empty((len(x_val),3))
list=[y_pred_rf,y_pred_svm,y_pred_ex]
for index,value in enumerate(list):
    x_val_new[:,index]=value
accuracy_score(rf_clf_new.predict(x_test_new),y_test)

原网站

版权声明
本文为[Sheep Baa Baa Baa Baa Baa]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/209/202207280220156752.html

当前位置：网站首页>Random forest and integration method learning notes

Random forest and integration method learning notes

边栏推荐

猜你喜欢

随机推荐