当前位置：网站首页>How to improve deep learning performance?

How to improve deep learning performance?

2022-07-28 13:07:00 【I have a good temper】

The performance of the algorithm can be improved from the following aspects ：

Improve performance from data
Improve the performance of the algorithm
Improve performance from algorithm tuning
From which model fusion improves performance

The performance improvement is from 1 To 2 In descending order . For example, the effect of new modeling methods or more data is often better than jumping out of the optimal parameters . But this is not absolute , Only in most cases .

One 、 Improve performance from data

1. Collect more data

The quality of the model often depends on the quality of your training data , You need to make sure that the data you use is the most effective data for the problem , You also want as much data as possible . Deep learning and modern nonlinear machine learning models have better effects on big data sets , Especially deep learning , This is also one of the main reasons why deep learning methods are exciting .

Not always the more data the better , In most cases .

2. Generate more data

Deep learning algorithms often work well when there is a large amount of data , If for some reason you don't get more data , You can also make some data .

If your data is a numeric vector , Then randomly generate the deformation vector of the existing vector .
If your data is an image , Randomly generate similar images with existing images .
If your data is text , Use the existing text to randomly generate similar text .

Such practices are often referred to as data extension or data generation , You can use the generated model . If you use image data , Simple random selection and translation of existing images can achieve great improvement . It improves the generalization ability of the model , If this kind of transformation is included in the new data, it can be handled well . Sometimes noise is added to the network data , This is equivalent to a rule-based approach , Avoid over fitting training data .

3. Scale the data

This method is simple and effective , An amazing magic weapon for using neural network models is ： Scale the data to the threshold range of the planning function .

If you use sigmoid Activation function , Scale data to 0~1 Between . If you choose tanh Activation function , Control the value range to -1~1 Between .

Input 、 The output data goes through the same transformation . such as , If there is one in the output layer sigmoid Function converts the output value to binary data , Will output y Normalized to binary . If you choose softmax function , Yes y Normalization is still effective .

It is also recommended that you extend the training data to generate multiple different versions ：

Normalize to 0~1
Normalize to -1~1
Standardization

Then test the performance of the model on each data set , Select the best group to generate data . If the activation function is replaced , You'd better repeat this small experiment .

It is not suitable to calculate large values in the model . Besides , There are many other ways to compress the data in the model , For example, normalize the weight and activation value .

4. Transform the data

You need to really understand the data used , Data visualization , Then pick out outliers .

First guess the distribution of each column of data ：

Is this column of data a skewed Gaussian distribution , If so , Try to use Box-Cox Methods correct the tilt
Is this column of data exponentially distributed , If so , Then perform logarithmic transformation
Does this column of data have some characteristics , But it's hard to find , Try to square or square the data
Can features be discretized , In order to better emphasize some features

also , Try the following methods ：

Whether the data can be preprocessed by projection , such as PCA?
Whether multiple attributes can be combined into a single value ？
Whether we can explore new attributes , use bool Value representation ?
Whether there can be new discoveries on the time scale or other dimensions ?

Neural network has feature learning function , They can accomplish these things . But if you can better present the structure of the problem , The learning speed of network model will be much faster , Quickly try various transformation methods on the training set , Look at those methods that are useful .

5. feature selection

Neural networks are little affected by irrelevant data , They will give this a trend close to 0 The weight of , The contribution of secondary characteristics to the predicted value is almost ignored . We have many feature selection methods and feature importance methods to identify which features can be retained , Those features need to be removed .

Perhaps the same result can be obtained with fewer features 、 Even better results .
Maybe all feature selection methods will choose to abandon some feature attributes . Then take a good look at these useless features .
Perhaps the selected features have brought you new inspiration , Build more features .

6. Redefine the problem

Are these observations you collect the only way to describe the problem ？ Maybe there are other ways . Maybe other ways can expose the structure of the problem more clearly .

This method will force us to broaden our thinking , It's hard to do well , Especially if you have invested a lot of time 、 energy .

Here are some ideas ：

Maybe you can melt time and Su into one window
Maybe your classification problem can be transformed into regression problem , vice versa
Maybe the output of binary type can be converted to softmax Output
Maybe you can model subproblems

Two 、 Improve the performance of the algorithm

*** , All theoretical and mathematical knowledge describes different ways to learn the decision-making process from data .

1. Algorithm filtering

You don't know in advance which algorithm works best for your problem , If you know , That may not be called machine learning . What evidence do you have that the method you have adopted is the best choice ？

When the party evaluates the effect on all possible problems , No single algorithm works better than others . All algorithms are equal . This is the theory that there is no free lunch .

Maybe the algorithm you choose is not the most suitable for your problem . Now we don't expect to solve all the problems , But the current popular algorithms may not be suitable for your dataset . We can choose to collect evidence first , Assume that there are other appropriate algorithms for your problem .

First choose some common algorithms , Pick out the right ones .

Try some linear algorithms , Such as logistic regression and linear discriminant analysis
Try some tree models , such as CART、 Random forest and gradient lifting
Try SVM and KNN And so on
Try other neural network models , such as LVQ、MLP、CNN、LSTM wait

Adopt several methods with good results , Then fine tune the parameters and data to further improve the effect .

2. Learn from the literature

From the literature “ steal ” Thinking is a shortcut . Has anyone else done something similar to you , What method do they use . Reading papers 、 Books 、 Q & a website 、 Tutorials and Google All the information provided to you . Write down all the ideas , Then continue to explore in these directions , This is not a repeat study , This is to help you find new ideas .

Priority is given to published papers , Many smart people have written a lot of interesting things . Make good use of this valuable resource .

3. Resampling method

You need to know how your model works , Is your model reliable ？ The training speed of deep learning model is very slow , This means that we cannot use the standard golden rule to judge the effect of the model , such as k Crossover verification .

Maybe you simply divide the data into training set and test set . If so , It is necessary to ensure that the data distribution after segmentation remains unchanged . Univariate statistics and data visualization are good methods
You can expand the hardware to improve the effect .
You can also select part of the data for cross validation ( about early stopping Very effective )
You can also play where to keep part of the data independently for model validation

On the other hand , You can also make the data set smaller , Adopt stronger resampling method .

Maybe you will see that the model effect trained on the sampled data set has a strong correlation with the effect trained on the whole data set . that , You can use small data sets for model selection , Then the final selected method is applied to all data sets .

Maybe you can limit the size of the data set at will , Sample part of the data , Use them to complete all training tasks , You must have full confidence in the prediction of the effect of the model .

3、 ... and 、 Improve performance from algorithm tuning

1. Model diagnosability

Only knowing why the performance of the model is no longer improved , Can achieve the best effect . Is it because of over fitting or under fitting . The model is always in between these two states , It's just a matter of degree . A way to quickly check the performance of the model is to calculate the performance of the model in the training set and verification set at each step , Graph the results .

Test the accuracy of the model on the training set and verification set .

If the training set is better than the verification set , It shows that there can be over fitting phenomenon , Try adding regular terms .
If the accuracy of the training set and the verification machine is very low , Indicates possible under fitting , You can continue to improve the ability of the model , Extend the training steps .
If the curves of the training set and the verification set have a focus , You may need to use early stopping It's a great skill .

Often return similar icons , Study and compare different methods , To improve the performance of the model . These charts may be your most valuable diagnostic tool , Another effective diagnostic method is to study the samples correctly predicted or incorrectly predicted by the model .

Sometimes you need more unpredictable sample data
Sometimes you can delete samples that are easy to learn from the training set
Maybe you can train different models for different types of input data

2. Weight initialization

Initialize the weight with a small random number

Different activation functions can also have different coping strategies , Keep your model structure unchanged , Try different initialization strategies . The weight value is the parameter that your model needs to train . Several groups of different weight values can achieve good results , But you want better results .

Try all initialization methods , Find the best set of initialization values
Try unsupervised pre learning , Like an automatic coder
Try a set of existing model weight parameters , Then retrain the input and output layers ( The migration study )

The method of modifying the weight initialization value is equivalent to the effect of modifying the activation function or objective function .

3. Learning rate

Adjusting the learning rate can also improve the effect .

Try very big 、 Very small learning rate
According to the references , Grid search around regular values
Try to use a gradually decreasing learning rate
Try to decrease the learning rate every fixed training step
Try adding a vector value , Then use the grid to search

Large network models require more training steps , vice versa . If you add more neural nodes and network layers , We need to increase the learning rate . Learning rate and training steps 、batch There is a coupling relationship between size and optimization method .

4. Activation function

In most cases, the activation function is ReLU Activation function , Just because they work well . stay ReLU It was popular before sigmoid and tanh, Then there is the output layer softmax、 Linear and sigmoid function .

Try all three functions , Remember to normalize the input data to their range , You need to select the transfer function according to the output content form . such as , Categorize binary sigmoid The function is changed to a linear function of the regression problem , Then the output value is processed , colleagues , It may be necessary to adjust the appropriate loss function .

5. Network structure

Adjusting the topology of the network will also be of great help . How many nodes need to be designed , How many layers of network are required ？ No one knows how much , You must find a reasonable set of parameter configurations by yourself .

Try adding a hidden layer with many nodes ( Broaden )
Try a deep neural network , There are fewer nodes in each layer ( In depth )
Try combining the above two
Try to imitate recently published papers on similar problems
Try the classic techniques of topological patterns and Book lines

The larger the network model, the stronger the expression ability , The structure of more layers provides the possibility of more structured combinations of abstract features , The later network model needs more training process , We need to constantly adjust the training step size and learning rate .

6.batch and epoch

batch The size of the determines the gradient value , And the frequency of weight update . One epoch It means that all samples in the training set participate in one round of training , With batch In order , You need to try different batch and epoch. Deep learning models are often small batch And the big one epoch And repeated training .

Try to batch The size of is set to the size of all training sets (batch learning)
Try to batch The size is set to 1（online learning）
Try different sizes of mini-batch(8, 16, 32, ... )
Try a few rounds of training epoch, Then continue training for many rounds epoch

Try to set an approximate infinity epoch frequency , Then snapshot some intermediate results , Find the best model . Some model structures are right batch The size of is very sensitive , Multi layer perceptron pair batch The size of is very insensitive , and LSTM and CNN Is very sensitive .

7. The regularization

Regularization is a good way to over fit customer service training data . The popular regularization method is Dropout,Dropout Methods some nerve nodes were skipped randomly in the training process , Force other nodes on the same layer to take over , Simple but very effective .

Weight attenuation to punish large weight values
The activation limit is used to punish large activation function values
Try to use various punishment measures and punishment items for experiments , such as L1、L2 And the sum of the two .

8. Optimization objectives

In the past, the main solution method was random gradient descent , However, there are now many optimizers . Gradient descent is the default method , First use him to get a result , Then adjust different learning rates 、 The momentum value is optimized .

Many more advanced optimization methods use more parameters , The structure is more complicated , Faster convergence , The two methods in the West converge faster , Be able to quickly understand the potential of a network topology .

ADAM
RMSprop

Other optimization algorithms can also be used , For example, more traditional algorithms （Levenberg-Marquardt） And a relatively new algorithm （ Genetic algorithms ）. Other methods can give SGD Create a good start , Facilitate subsequent tuning .

The loss function to be optimized is more relevant to the problem you need to solve . however , There are also some common tricks （ For example, regression problems are often used MSE and MAE）, Changing the loss function sometimes brings unexpected gains . Again , This may also be related to the scale of the data you enter and the activation function you use .

9. End training early (Early Stopping)

You can stop training when the performance of the model begins to decline , This has saved us a lot of time , Therefore, a more refined resampling method can be used to evaluate the model .

early stopping A regularization method of Ye's textile over fitting , You need to observe the effect of the model on the training set and verification set after each round of training . Once the effect of the model on the validation set decreases , You can stop training . You can also set checkpoints , Save the current state , Then the model can continue to learn .

Four 、 Improve performance from model fusion

You can combine the prediction results of multiple models , After model tuning , This is another big area of improvement . in fact , The prediction results of several models with good effects are often fused , The effect is better than that predicted by multiple fine tuned models .

1. Model fusion

You don't have to pick a model , Instead, integrate them . If you train multiple deep learning models , Each one works well , Then their prediction results are taken as the mean .

The greater the difference between the models , The better the result. . for instance , You can use very different network topologies and techniques .

If each model is independent and effective , Then the integrated result is more stable .

Contrary , You can also do the experiment in reverse . Every time the network model is trained , Are initialized in different ways , The final weight also converges to different values . Repeat this process many times to generate multiple network models , Then integrate the prediction results of these models . Their predictions will be highly correlated , But for samples that are difficult to predict, there may be a little improvement .

2. Fusion of perspectives

Take completely different scaling and transformation techniques for training data . The greater the difference between the selected change mode and the description angle of the problem , The greater the possibility of improving the effect , Simply averaging the prediction results is a good way .

3.stacking

You can also learn how to integrate the prediction results of various models . This is called stacked generalization , Or for short stacking. Usually , The weight of each model's predicted value can be learned by simple linear regression . The method of taking the mean value of the prediction results of each model as baseline, The weighted fusion was used as the experimental group .

原网站

版权声明
本文为[I have a good temper]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/196/202207130926473412.html