当前位置:网站首页>Chapter VIII integrated learning
Chapter VIII integrated learning
2022-07-08 01:07:00 【Intelligent control and optimization decision Laboratory of Cen】
1. Talk about the concept and thought of integrated learning .
Integrated learning (ensemble learning) Learning tasks are accomplished by building and combining multiple learners , It is sometimes called multi classifier system (multi-classifier system) 、 Committee based learning (committee-based learning) etc. . The schematic diagram of integrated learning is shown below :
Its main idea is to combine multiple learners , It is often possible to obtain significantly better generalization performance than a single learner . Because the errors of the base learners are independent of each other . In a real task , Individual learners are trained to solve the same problem , They obviously can't be independent of each other ! in fact , Of individual learners " accuracy " and " diversity " There is conflict in itself . General , After high accuracy , Increasing diversity requires sacrificing accuracy . Because the final result is voted by each learner , therefore , How to produce and combine " Good but different " Individual learners of , It is the core of integrated learning research .
2. What kinds of integrated learning methods can be divided into , And explain their characteristics respectively .
According to how individual learners are generated , The current ensemble learning methods can be roughly divided into two categories ? That is, there is a strong dependency between individual learners 、 Serialization methods that must be generated serially ? And there is no strong dependence between individual learners 、 Parallel methods that can be generated at the same time ; The representative of the former is Boosting, The latter is represented by Bagging and " Random forests " (Random Forest).
- Boosting Method
First, a basic learner is trained from the initial training set , Then adjust the distribution of training samples according to the performance of the basic learner , This makes the training samples that have made mistakes in the previous base learner receive more attention in the follow-up , Then the next base learner is trained based on the adjusted sample distribution ; Repeat it like this , Until the number of base learners reaches a predetermined value T T T, Finally, I will T T T The weighted combination of the base learners is carried out . The flow chart is shown in the figure below :
characteristic : Each round of training should check whether the currently generated base learner meets the basic conditions ( Such as whether it is better than random guess ), If the conditions are not met , Then the current learner is discarded , The learning process ends . If the number of study rounds does not reach T T T And lead to poor results , Resampling can be used to restart learning . - Bagging
This kind of method is to sample from the initial training data set T T T Inclusive m m m Sample sets of training samples , Then a base learner is trained based on each sample set , Then combine these basic learning machine roots . The flow chart of this method is shown below :
characteristic : Self-help sampling only uses some samples of the initial training set , The remaining samples can be used as validation sets to estimate the generalization performance outside the package ( Pruning can be assisted when the base learner is a decision tree , When the base learner is a neural network , Can reduce the risk of over fitting ).
3. In ensemble learning , Elaborate on the problem of two classifications AdaBoost Algorithm implementation process . reflection AdaBoost How does the algorithm change the weight or probability distribution of the training data in each round ?
3.1 Adaboost Algorithm flow
iterative process :
The illustration :
3.2 Main steps :
1. Initialize the weight distribution of training data .
among D 1 D_1 D1 Represents the weight of each sample in the first iteration , w 11 w_{11} w11 Represents the weight of the first sample in the first iteration , N N N Is the total number of samples .
2. Conduct M Sub iteration
a) Use ownership weight distribution D m ( m = 1 , 2 , . . . , N ) D_m(m=1,2,...,N) Dm(m=1,2,...,N) Training samples for learning , Get the weak classifier : G m ( x ) : x → G_m(x):x\rightarrow Gm(x):x→ { − 1 , 1 -1,1 −1,1}.
The performance index of the weak classifier is determined by the value of the following error function ϵ m \epsilon_m ϵm To measure :
b) Compute weak classifiers G m G_m Gm The right to speak α m \alpha_m αm( That is, the weight ), It said G m G_m Gm Importance in final classifier . The calculation method is as follows :
With e m e_m em Reduce , α m \alpha_m αm Gradually increase . The formula shows that , The classifier with small error rate is of great importance in the final classifier .
c) Update the weight distribution of training samples , For the next iteration , The weight of the incorrectly classified samples increases , The weight of the correctly classified samples decreases . The calculation method is as follows :
among , D m + 1 D_{m+1} Dm+1 Is the sample value for the next iteration , w m + 1 , 1 w_{m+1,1} wm+1,1 It's the next iteration , The first i i i Weight of samples . y i y_{i} yi On behalf of the i i i Categories corresponding to samples (1 or -1), G m ( x i ) G_m(x_i) Gm(xi) Indicates that the weak classifier pairs samples x i x_{i} xi The classification of . If the classification is correct , be G m ( x i ) G_m(x_i) Gm(xi) The value of is 1, Instead of -1. among Z m ) Z_m) Zm) It's the normalization factor , The calculation method is as follows :
The third step : Combine weak classifiers , Get a strong classifier
First , Weighted sum of all iterated classifiers :
next , take sign The function acts on the sum result , Get the final strong classifier G ( x ) G(x) G(x):
sign function : Is a logical function , The sign used to judge real numbers . The definition for :
4. What is the relationship between random forest and ensemble learning ?
Random forest is an algorithm that integrates multiple trees through the idea of ensemble learning , Its basic unit is the decision tree , And its essence belongs to the integrated learning method .
There are two key words in the name of random forest , One is “ Random ”, One is “ The forest ”.“ The forest ” It's easy for us to understand , One is called a tree , Then hundreds of trees can be called forests , This is also the main idea of random forest – The embodiment of integration thought . However ,bagging The cost of this is not to use a single decision tree to make predictions , Which variable plays an important role becomes unknown , therefore bagging Improved prediction accuracy but lost interpretability .
After learning the decision tree , Understanding the decision tree is simple and intuitive , Easy to understand , The data does not need preprocessing, and the fault tolerance rate for outliers is abnormally high . However, in practice , Decision trees tend to over fit , Unstable performance , This is the need for integration . The theoretical basis is : Law of large Numbers , Under the condition of constant test , Repeat the experiment many times , The frequency of a random event approximates its probability . There is a certain inevitability in contingency .
What is integration ? It's similar to asking thousands of people complicated questions at random , Then summarize their answers , At this time, you will find that their answers are more detailed than those of experts 、 accurate . In machine learning , We all use some estimator to train the model , Now aggregate multiple estimators to Train fitting data , We become integrated learning , This method is called integration method . Random forest is the product of integrated learning , It is composed of a group of decision trees , Each tree is trained based on different two-thirds random subsets in the training set , The growth of each tree is no longer based on the best characteristics , Random forests introduce more randomness into the growth of trees , Instead, select the best feature in a randomly generated feature subset , In this way, the random forest trades higher deviation for lower variance . In this case , Random forests do not have important features like decision trees that appear near heel nodes , Unimportant features appear in leaf nodes unknown . For random forest, we need to calculate the average depth of a feature on all decision trees to estimate The importance of features ( Features can be applied in many ways , For example, whether the credit rating of an enterprise can be correctly evaluated in the bank loan business , It is related to whether the loan can be effectively recovered . But there are many data characteristics of credit evaluation models , There is a lot of noise , So we need to calculate the importance of each feature and rank them , Then we can select the features with high importance from all the features ). This is an important feature of random forest , Apply to feature selection . Finally, as long as we count the results of each tree, we can draw a corresponding conclusion .
5. use python Based on single-layer decision tree AdaBoost Algorithm .
Code :
// An highlighted block
var foo ='bar'';
# -*- coding: utf-8 -*-
import numpy as np
def loadSimData():
'''
Input : nothing
function : Provide a data set with two characteristics
Output : Datasets with labels
'''
datMat = np.matrix([[1. ,2.1],[2. , 1.1],[1.3 ,1.],[1. ,1.],[2. ,1.]])
classLabels = [1.0, 1.0, -1.0, -1.0, 1.0]
return datMat, classLabels
def stumpClassify(dataMatrix,dimen,thresholdValue,thresholdIneq):
'''
Input : Data matrix , Characteristic dimension , Classification threshold of a feature , Classification inequality
function : Output decision tree tag
Output : label
'''
returnArray = np.ones((np.shape(dataMatrix)[0],1))
if thresholdIneq == 'lt':
returnArray[dataMatrix[:,dimen] <= thresholdValue] = -1
else:
returnArray[dataMatrix[:,dimen] > thresholdValue] = -1
return returnArray
def buildStump(dataArray,classLabels,D):
'''
Input : Data matrix , Corresponding real category label , Weight distribution of features
function : On the dataset , Find the weighted error rate ( Classification error rate ) The smallest single-layer decision tree , obviously , The index function is closely related to the weight vector
Output : Best stump ( features , Classification feature threshold , Unequal sign direction ), Minimum weighted error rate , The weight vector D The estimated value of the classification label under
'''
dataMatrix = np.mat(dataArray); labelMat = np.mat(classLabels).T
m,n = np.shape(dataMatrix)
stepNum = 10.0; bestStump = {
}; bestClassEst = np.mat(np.zeros((m,1)))
minError = np.inf
for i in range(n):
rangeMin = dataMatrix[:,i].min(); rangeMax = dataMatrix[:,i].max()
stepSize = (rangeMax - rangeMin)/stepNum
for j in range(-1, int(stepNum)+1):
for thresholdIneq in ['lt', 'gt']:
thresholdValue = rangeMin + float(j) * stepSize
predictClass = stumpClassify(dataMatrix,i,thresholdValue,thresholdIneq)
errArray = np.mat(np.ones((m,1)))
errArray[predictClass == labelMat] = 0
weightError = D.T * errArray
#print "split: dim %d, thresh: %.2f,threIneq:%s,weghtError %.3F" %(i,thresholdValue,thresholdIneq,weightError)
if weightError < minError:
minError = weightError
bestClassEst = predictClass.copy()
bestStump['dimen'] = i
bestStump['thresholdValue'] = thresholdValue
bestStump['thresholdIneq'] = thresholdIneq
return bestClassEst, minError, bestStump
def adaBoostTrainDS(dataArray,classLabels,numIt=40):
'''
Input : Data sets , Label vector , Maximum number of iterations
function : establish adaboost additive model
Output : An array of multiple weak classifiers
'''
weakClass = []# Define weak classification array , Save each basic classifier bestStump
m,n = np.shape(dataArray)
D = np.mat(np.ones((m,1))/m)
aggClassEst = np.mat(np.zeros((m,1)))
for i in range(numIt):
print ("i:",i)
bestClassEst, minError, bestStump = buildStump(dataArray,classLabels,D)#step1: Find the best single-layer decision tree
print ("D.T:", D.T)
alpha = float(0.5*np.log((1-minError)/max(minError,1e-16)))#step2: to update alpha
print ("alpha:",alpha)
bestStump['alpha'] = alpha
weakClass.append(bestStump)#step3: Add the basic classifier to the array of weak classifiers
print ("classEst:",bestClassEst)
expon = np.multiply(-1*alpha*np.mat(classLabels).T,bestClassEst)
D = np.multiply(D, np.exp(expon))
D = D/D.sum()#step4: Update weights , This formula is to let D Obey probability distribution
aggClassEst += alpha*bestClassEst#steo5: Update cumulative category estimates
print ("aggClassEst:",aggClassEst.T)
print (np.sign(aggClassEst) != np.mat(classLabels).T)
aggError = np.multiply(np.sign(aggClassEst) != np.mat(classLabels).T,np.ones((m,1)))
print ("aggError",aggError)
aggErrorRate = aggError.sum()/m
print ("total error:",aggErrorRate)
if aggErrorRate == 0.0: break
return weakClass
def adaTestClassify(dataToClassify,weakClass):
dataMatrix = np.mat(dataToClassify)
m =np.shape(dataMatrix)[0]
aggClassEst = np.mat(np.zeros((m,1)))
for i in range(len(weakClass)):
classEst = stumpClassify(dataToClassify,weakClass[i]['dimen'],weakClass[i]['thresholdValue']\
,weakClass[i]['thresholdIneq'])
aggClassEst += weakClass[i]['alpha'] * classEst
print (aggClassEst)
return np.sign(aggClassEst)
if __name__ == '__main__':
D =np.mat(np.ones((5,1))/5)
dataMatrix ,classLabels= loadSimData()
bestClassEst, minError, bestStump = buildStump(dataMatrix,classLabels,D)
weakClass = adaBoostTrainDS(dataMatrix,classLabels,9)
testClass = adaTestClassify(np.mat([0,0]),weakClass)
result :
// An highlighted block
i: 0
D.T: [[0.2 0.2 0.2 0.2 0.2]]
alpha: 0.6931471805599453
classEst: [[-1.]
[ 1.]
[-1.]
[-1.]
[ 1.]]
aggClassEst: [[-0.69314718 0.69314718 -0.69314718 -0.69314718 0.69314718]]
[[ True]
[False]
[False]
[False]
[False]]
aggError [[1.]
[0.]
[0.]
[0.]
[0.]]
total error: 0.2
i: 1
D.T: [[0.5 0.125 0.125 0.125 0.125]]
alpha: 0.9729550745276565
classEst: [[ 1.]
[ 1.]
[-1.]
[-1.]
[-1.]]
aggClassEst: [[ 0.27980789 1.66610226 -1.66610226 -1.66610226 -0.27980789]]
[[False]
[False]
[False]
[False]
[ True]]
aggError [[0.]
[0.]
[0.]
[0.]
[1.]]
total error: 0.2
i: 2
D.T: [[0.28571429 0.07142857 0.07142857 0.07142857 0.5 ]]
alpha: 0.8958797346140273
classEst: [[1.]
[1.]
[1.]
[1.]
[1.]]
aggClassEst: [[ 1.17568763 2.56198199 -0.77022252 -0.77022252 0.61607184]]
[[False]
[False]
[False]
[False]
[False]]
aggError [[0.]
[0.]
[0.]
[0.]
[0.]]
total error: 0.0
[[-0.69314718]]
[[-1.66610226]]
[[-2.56198199]]
边栏推荐
- [note] common combined filter circuit
- 5. Over fitting, dropout, regularization
- Su embedded training - Day6
- tourist的NTT模板
- C#中string用法
- 炒股开户怎么最方便,手机上开户安全吗
- Implementation of adjacency table of SQLite database storage directory structure 2-construction of directory tree
- Cancel the down arrow of the default style of select and set the default word of select
- Swift get URL parameters
- 新库上线 | CnOpenData中华老字号企业名录
猜你喜欢
随机推荐
130. 被圍繞的區域
NVIDIA Jetson test installation yolox process record
How to use education discounts to open Apple Music members for 5 yuan / month and realize member sharing
[reprint] solve the problem that CONDA installs pytorch too slowly
130. 被围绕的区域
炒股开户怎么最方便,手机上开户安全吗
v-for遍历元素样式失效
第四期SFO销毁,Starfish OS如何对SFO价值赋能?
Basic types of 100 questions for basic grammar of Niuke
STL -- common function replication of string class
[deep learning] AI one click to change the sky
Using GPU to train network model
AI遮天传 ML-初识决策树
Cve-2022-28346: Django SQL injection vulnerability
9.卷积神经网络介绍
Su embedded training - C language programming practice (implementation of address book)
[necessary for R & D personnel] how to make your own dataset and display it.
新库上线 | CnOpenData中国星级酒店数据
Hotel
Markdown learning (entry level)