当前位置：网站首页>Neural network learning (2) introduction 2

Neural network learning (2) introduction 2

2022-07-26 18:33:00 【@[email protected]】

Catalog

1. Preface

2. Data set partitioning

3. Deviation and variance

4 Regularization

4.1 Other regularization methods

1. Preface

One reason why deep learning is difficult to achieve the maximum effect in the field of big data is , Training on a huge data set is slow . The optimization algorithm can help us train the model quickly , Improve computational efficiency . Next, let's see what methods can solve the problems we just encountered or similar problems

1. Equipment problems ----- adopt gpu Improve training speed
2. optimization algorithm ---- Reduce the gradient and disappear , Gradient explosion , Local optimum problem （ The exponential increase or decrease in the gradient function is called Gradient explosion perhaps The gradient disappears . ）

2. Data set partitioning

First of all, we have a simple review of the data set division involved in machine learning

Training set （train set）： The algorithm or model is simulated with the training set Training The process ;
Verification set （development set）： Using validation sets （ Also known as simple cross validation set ,hold-out cross validation set） Conduct Cross validation , Choose the best model ;----- Generally omit
Test set （test set）： Finally, the model is tested by test set , Evaluate learning methods .

stay Small data volume Era , Such as 100、1000、10000 The size of the amount of data , The dataset can be divided according to the following scale ：

In the case of no verification set ：70% / 30%
If there is a verification set ：60% / 20% / 20%

And in today's Big data era , The size of the data set owned may be millions , So the proportion of verification and test sets tends to be smaller .

100 Ten thousand data ：98% / 1% / 1%
Millions of data ：99.5% / 0.25% / 0.25%

These ratios can be selected according to the data set .

3. Deviation and variance

“ deviation - Variance decomposition ”（bias-variance decomposition） It is an important tool to explain the generalization performance of learning algorithms .

Generalization error can be decomposed into deviation 、 Variance and noise , Generalization performance By The ability to learn algorithms 、 Sufficiency of data as well as The difficulty of the learning task itself Jointly decided .

deviation ： The deviation between the expected prediction and the real result of the learning algorithm is measured , Immediately drew Learning the fitting ability of the algorithm itself The smaller the better. -- There is an under fitting problem
variance ： It measures the change of learning performance caused by the change of training set of the same size , Immediately drew The impact of data disturbance -- There have been fitting problems
noise ： It expresses the expectation that any learning algorithm can achieve in the current task Lower bound of generalization error , Immediately drew The difficulty of learning the problem itself .

So the deviation 、 What is the relationship between variance and the division of our data set ？

1、 The error rate of the training set is small , And the validation set / The test set has a high error rate , It shows that the model has large variance , There may be over fitting
2、 The error rates of training set and test set are large , And the two are similar , It shows that there is a large deviation in the model , There may be a lack of fit
3、 The error rates of training set and test set are small , And the two are similar , It shows that the variance and deviation are small , This model works better .

So we finally conclude , Variance generally refers to the data model , Whether the disturbance of unknown data can be predicted accurately . and The deviation indicates that the error is large in the training set , Basically, there is no good effect in the test set .

So if our model has large variance or large deviation at the same time , How to solve it ？

resolvent ：

For high square difference ( Over fitting ), There are several ways ：

Get more data , So that the training can cover all possible situations
Regularization （Regularization）
Find a more suitable network structure ---- There are too many neurons , A little less

For high deviations ( Under fitting ), There are several ways ：

Expand the scale of the network , Such as adding hidden layers or the number of neurons
Looking for the right network architecture , Use a larger network structure , Such as AlexNet
Train longer

Keep trying , Until a low deviation is found 、 Low variance framework .

4 Regularization

Regularization , That is to add a regularization term to the loss function ( Penalty item ), The complexity of the penalty model , Prevent over fitting of network

Logically regressive L1 And L2 Regularization ：

Add a term to the loss function , So actually, the gradient descent is to reduce the size of the loss function , about L2 perhaps L1 Generally speaking, it is necessary to reduce the size of the regular term , Then it will reduce W The size of the weight . This is our intuitive feeling .

4.1 Other regularization methods

Early stop method （Early Stopping）
Data to enhance

Data to enhance

Data to enhance

Finger pass shear 、 rotate / Reflection / Flip transformation 、 Zoom transform 、 Translation transformation 、 Scale transformation 、 Contrast conversion 、 Noise disturbance 、 Color change And one or more ways to enhance the transformation of combined data To increase the size of the dataset .

Even if the convolutional neural network is placed in different directions , Convolution neural network for translation 、 visual angle 、 Size or illumination （ Or a combination of the above ） Keep invariance , Will be considered an object .

We need to reduce the number of irrelevant features in the dataset . For the car type classifier above , You just need to flip the photos in the existing dataset horizontally , Orient the car to the other side . Now? , Training neural networks with new data sets , By enhancing the dataset , It can prevent the neural network from learning irrelevant patterns , Improve the effect .

原网站

版权声明
本文为[@[email protected]]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/207/202207261736095231.html