当前位置:网站首页>How to improve deep learning performance?
How to improve deep learning performance?
2022-07-28 13:07:00 【I have a good temper】
The performance of the algorithm can be improved from the following aspects :
Improve performance from data
Improve the performance of the algorithm
Improve performance from algorithm tuning
From which model fusion improves performance
The performance improvement is from 1 To 2 In descending order . For example, the effect of new modeling methods or more data is often better than jumping out of the optimal parameters . But this is not absolute , Only in most cases .
One 、 Improve performance from data
1. Collect more data
The quality of the model often depends on the quality of your training data , You need to make sure that the data you use is the most effective data for the problem , You also want as much data as possible . Deep learning and modern nonlinear machine learning models have better effects on big data sets , Especially deep learning , This is also one of the main reasons why deep learning methods are exciting .

Not always the more data the better , In most cases .
2. Generate more data
Deep learning algorithms often work well when there is a large amount of data , If for some reason you don't get more data , You can also make some data .
- If your data is a numeric vector , Then randomly generate the deformation vector of the existing vector .
- If your data is an image , Randomly generate similar images with existing images .
- If your data is text , Use the existing text to randomly generate similar text .
Such practices are often referred to as data extension or data generation , You can use the generated model . If you use image data , Simple random selection and translation of existing images can achieve great improvement . It improves the generalization ability of the model , If this kind of transformation is included in the new data, it can be handled well . Sometimes noise is added to the network data , This is equivalent to a rule-based approach , Avoid over fitting training data .
3. Scale the data
This method is simple and effective , An amazing magic weapon for using neural network models is : Scale the data to the threshold range of the planning function .
If you use sigmoid Activation function , Scale data to 0~1 Between . If you choose tanh Activation function , Control the value range to -1~1 Between .
Input 、 The output data goes through the same transformation . such as , If there is one in the output layer sigmoid Function converts the output value to binary data , Will output y Normalized to binary . If you choose softmax function , Yes y Normalization is still effective .
It is also recommended that you extend the training data to generate multiple different versions :
- Normalize to 0~1
- Normalize to -1~1
- Standardization
Then test the performance of the model on each data set , Select the best group to generate data . If the activation function is replaced , You'd better repeat this small experiment .
It is not suitable to calculate large values in the model . Besides , There are many other ways to compress the data in the model , For example, normalize the weight and activation value .
4. Transform the data
You need to really understand the data used , Data visualization , Then pick out outliers .
First guess the distribution of each column of data :
- Is this column of data a skewed Gaussian distribution , If so , Try to use Box-Cox Methods correct the tilt
- Is this column of data exponentially distributed , If so , Then perform logarithmic transformation
- Does this column of data have some characteristics , But it's hard to find , Try to square or square the data
- Can features be discretized , In order to better emphasize some features
also , Try the following methods :
- Whether the data can be preprocessed by projection , such as PCA?
- Whether multiple attributes can be combined into a single value ?
- Whether we can explore new attributes , use bool Value representation ?
- Whether there can be new discoveries on the time scale or other dimensions ?
Neural network has feature learning function , They can accomplish these things . But if you can better present the structure of the problem , The learning speed of network model will be much faster , Quickly try various transformation methods on the training set , Look at those methods that are useful .
5. feature selection
Neural networks are little affected by irrelevant data , They will give this a trend close to 0 The weight of , The contribution of secondary characteristics to the predicted value is almost ignored . We have many feature selection methods and feature importance methods to identify which features can be retained , Those features need to be removed .
- Perhaps the same result can be obtained with fewer features 、 Even better results .
- Maybe all feature selection methods will choose to abandon some feature attributes . Then take a good look at these useless features .
- Perhaps the selected features have brought you new inspiration , Build more features .
6. Redefine the problem
Are these observations you collect the only way to describe the problem ? Maybe there are other ways . Maybe other ways can expose the structure of the problem more clearly .
This method will force us to broaden our thinking , It's hard to do well , Especially if you have invested a lot of time 、 energy .
Here are some ideas :
- Maybe you can melt time and Su into one window
- Maybe your classification problem can be transformed into regression problem , vice versa
- Maybe the output of binary type can be converted to softmax Output
- Maybe you can model subproblems
Two 、 Improve the performance of the algorithm
*** , All theoretical and mathematical knowledge describes different ways to learn the decision-making process from data .
1. Algorithm filtering
You don't know in advance which algorithm works best for your problem , If you know , That may not be called machine learning . What evidence do you have that the method you have adopted is the best choice ?
When the party evaluates the effect on all possible problems , No single algorithm works better than others . All algorithms are equal . This is the theory that there is no free lunch .
Maybe the algorithm you choose is not the most suitable for your problem . Now we don't expect to solve all the problems , But the current popular algorithms may not be suitable for your dataset . We can choose to collect evidence first , Assume that there are other appropriate algorithms for your problem .
First choose some common algorithms , Pick out the right ones .
- Try some linear algorithms , Such as logistic regression and linear discriminant analysis
- Try some tree models , such as CART、 Random forest and gradient lifting
- Try SVM and KNN And so on
- Try other neural network models , such as LVQ、MLP、CNN、LSTM wait
Adopt several methods with good results , Then fine tune the parameters and data to further improve the effect .
2. Learn from the literature
From the literature “ steal ” Thinking is a shortcut . Has anyone else done something similar to you , What method do they use . Reading papers 、 Books 、 Q & a website 、 Tutorials and Google All the information provided to you . Write down all the ideas , Then continue to explore in these directions , This is not a repeat study , This is to help you find new ideas .
Priority is given to published papers , Many smart people have written a lot of interesting things . Make good use of this valuable resource .
3. Resampling method
You need to know how your model works , Is your model reliable ? The training speed of deep learning model is very slow , This means that we cannot use the standard golden rule to judge the effect of the model , such as k Crossover verification .
- Maybe you simply divide the data into training set and test set . If so , It is necessary to ensure that the data distribution after segmentation remains unchanged . Univariate statistics and data visualization are good methods
- You can expand the hardware to improve the effect .
- You can also select part of the data for cross validation ( about early stopping Very effective )
- You can also play where to keep part of the data independently for model validation
On the other hand , You can also make the data set smaller , Adopt stronger resampling method .
Maybe you will see that the model effect trained on the sampled data set has a strong correlation with the effect trained on the whole data set . that , You can use small data sets for model selection , Then the final selected method is applied to all data sets .
Maybe you can limit the size of the data set at will , Sample part of the data , Use them to complete all training tasks , You must have full confidence in the prediction of the effect of the model .
3、 ... and 、 Improve performance from algorithm tuning
1. Model diagnosability
Only knowing why the performance of the model is no longer improved , Can achieve the best effect . Is it because of over fitting or under fitting . The model is always in between these two states , It's just a matter of degree . A way to quickly check the performance of the model is to calculate the performance of the model in the training set and verification set at each step , Graph the results .
Test the accuracy of the model on the training set and verification set .
- If the training set is better than the verification set , It shows that there can be over fitting phenomenon , Try adding regular terms .
- If the accuracy of the training set and the verification machine is very low , Indicates possible under fitting , You can continue to improve the ability of the model , Extend the training steps .
- If the curves of the training set and the verification set have a focus , You may need to use early stopping It's a great skill .
Often return similar icons , Study and compare different methods , To improve the performance of the model . These charts may be your most valuable diagnostic tool , Another effective diagnostic method is to study the samples correctly predicted or incorrectly predicted by the model .
- Sometimes you need more unpredictable sample data
- Sometimes you can delete samples that are easy to learn from the training set
- Maybe you can train different models for different types of input data
2. Weight initialization
Initialize the weight with a small random number
Different activation functions can also have different coping strategies , Keep your model structure unchanged , Try different initialization strategies . The weight value is the parameter that your model needs to train . Several groups of different weight values can achieve good results , But you want better results .
- Try all initialization methods , Find the best set of initialization values
- Try unsupervised pre learning , Like an automatic coder
- Try a set of existing model weight parameters , Then retrain the input and output layers ( The migration study )
The method of modifying the weight initialization value is equivalent to the effect of modifying the activation function or objective function .
3. Learning rate
Adjusting the learning rate can also improve the effect .
- Try very big 、 Very small learning rate
- According to the references , Grid search around regular values
- Try to use a gradually decreasing learning rate
- Try to decrease the learning rate every fixed training step
- Try adding a vector value , Then use the grid to search
Large network models require more training steps , vice versa . If you add more neural nodes and network layers , We need to increase the learning rate . Learning rate and training steps 、batch There is a coupling relationship between size and optimization method .
4. Activation function
In most cases, the activation function is ReLU Activation function , Just because they work well . stay ReLU It was popular before sigmoid and tanh, Then there is the output layer softmax、 Linear and sigmoid function .
Try all three functions , Remember to normalize the input data to their range , You need to select the transfer function according to the output content form . such as , Categorize binary sigmoid The function is changed to a linear function of the regression problem , Then the output value is processed , colleagues , It may be necessary to adjust the appropriate loss function .
5. Network structure
Adjusting the topology of the network will also be of great help . How many nodes need to be designed , How many layers of network are required ? No one knows how much , You must find a reasonable set of parameter configurations by yourself .
- Try adding a hidden layer with many nodes ( Broaden )
- Try a deep neural network , There are fewer nodes in each layer ( In depth )
- Try combining the above two
- Try to imitate recently published papers on similar problems
- Try the classic techniques of topological patterns and Book lines
The larger the network model, the stronger the expression ability , The structure of more layers provides the possibility of more structured combinations of abstract features , The later network model needs more training process , We need to constantly adjust the training step size and learning rate .
6.batch and epoch
batch The size of the determines the gradient value , And the frequency of weight update . One epoch It means that all samples in the training set participate in one round of training , With batch In order , You need to try different batch and epoch. Deep learning models are often small batch And the big one epoch And repeated training .
- Try to batch The size of is set to the size of all training sets (batch learning)
- Try to batch The size is set to 1(online learning)
- Try different sizes of mini-batch(8, 16, 32, ... )
- Try a few rounds of training epoch, Then continue training for many rounds epoch
Try to set an approximate infinity epoch frequency , Then snapshot some intermediate results , Find the best model . Some model structures are right batch The size of is very sensitive , Multi layer perceptron pair batch The size of is very insensitive , and LSTM and CNN Is very sensitive .
7. The regularization
Regularization is a good way to over fit customer service training data . The popular regularization method is Dropout,Dropout Methods some nerve nodes were skipped randomly in the training process , Force other nodes on the same layer to take over , Simple but very effective .
- Weight attenuation to punish large weight values
- The activation limit is used to punish large activation function values
- Try to use various punishment measures and punishment items for experiments , such as L1、L2 And the sum of the two .
8. Optimization objectives
In the past, the main solution method was random gradient descent , However, there are now many optimizers . Gradient descent is the default method , First use him to get a result , Then adjust different learning rates 、 The momentum value is optimized .
Many more advanced optimization methods use more parameters , The structure is more complicated , Faster convergence , The two methods in the West converge faster , Be able to quickly understand the potential of a network topology .
- ADAM
- RMSprop
Other optimization algorithms can also be used , For example, more traditional algorithms (Levenberg-Marquardt) And a relatively new algorithm ( Genetic algorithms ). Other methods can give SGD Create a good start , Facilitate subsequent tuning .
The loss function to be optimized is more relevant to the problem you need to solve . however , There are also some common tricks ( For example, regression problems are often used MSE and MAE), Changing the loss function sometimes brings unexpected gains . Again , This may also be related to the scale of the data you enter and the activation function you use .
9. End training early (Early Stopping)
You can stop training when the performance of the model begins to decline , This has saved us a lot of time , Therefore, a more refined resampling method can be used to evaluate the model .
early stopping A regularization method of Ye's textile over fitting , You need to observe the effect of the model on the training set and verification set after each round of training . Once the effect of the model on the validation set decreases , You can stop training . You can also set checkpoints , Save the current state , Then the model can continue to learn .
Four 、 Improve performance from model fusion
You can combine the prediction results of multiple models , After model tuning , This is another big area of improvement . in fact , The prediction results of several models with good effects are often fused , The effect is better than that predicted by multiple fine tuned models .
1. Model fusion
You don't have to pick a model , Instead, integrate them . If you train multiple deep learning models , Each one works well , Then their prediction results are taken as the mean .
The greater the difference between the models , The better the result. . for instance , You can use very different network topologies and techniques .
If each model is independent and effective , Then the integrated result is more stable .
Contrary , You can also do the experiment in reverse . Every time the network model is trained , Are initialized in different ways , The final weight also converges to different values . Repeat this process many times to generate multiple network models , Then integrate the prediction results of these models . Their predictions will be highly correlated , But for samples that are difficult to predict, there may be a little improvement .
2. Fusion of perspectives
Take completely different scaling and transformation techniques for training data . The greater the difference between the selected change mode and the description angle of the problem , The greater the possibility of improving the effect , Simply averaging the prediction results is a good way .
3.stacking
You can also learn how to integrate the prediction results of various models . This is called stacked generalization , Or for short stacking. Usually , The weight of each model's predicted value can be learned by simple linear regression . The method of taking the mean value of the prediction results of each model as baseline, The weighted fusion was used as the experimental group .
边栏推荐
- Uncover why devaxpress WinForms, an interface control, discards the popular maskbox property
- 快速读入
- Leetcode: array
- Leetcode394 string decoding
- [error prone points of C language] Part 4: detailed rules for storing structures in memory
- Fundamentals of machine learning - support vector machine svm-17
- Unity installs the device simulator
- 【嵌入式C基础】第8篇:C语言数组讲解
- 大模型哪家强?OpenBMB发布BMList给你答案!
- Shenwenbo, researcher of the Hundred Talents Program of Zhejiang University: kernel security in the container scenario
猜你喜欢

How to use databricks for data analysis on tidb cloud | tidb cloud User Guide
![[graduation design] smart home system based on ZigBee - single chip microcomputer Internet of things stm32](/img/c3/4268d7e4e1429f9b0d9d928790d740.png)
[graduation design] smart home system based on ZigBee - single chip microcomputer Internet of things stm32

Fundamentals of machine learning - principal component analysis pca-16

Installation and reinstallation of win11 system graphic version tutorial

Transaction of MySQL underlying principle (2)

The essence of enterprise Digitalization

Ccf201912-2 recycling station site selection

Summary: idea problem record

Full disclosure! Huawei cloud distributed cloud native technology and Practice

Li FuPan: application practice of kata safety container in ant group
随机推荐
mysql limit 分页优化
What if the win11 folder cannot be opened
2020-12-27
Problem solving during copilot trial
03 pyechars rectangular coordinate system chart (example code + effect drawing)
Detailed explanation of the usage of C # static
Fundamentals of machine learning - support vector machine svm-17
Linear classifier (ccf20200901)
归并排序
Merge sort
LeetCode394 字符串解码
STM32 Development Notes - experience sharing
Kotlin是如何帮助你避免内存泄漏的?
Summary: golang's ide:vscode usage
How to add PDF virtual printer in win11
Analysis of Andriod low on memory printing principle
Machine learning Basics - decision tree-12
机器学习实战-集成学习-23
Fundamentals of machine learning - principal component analysis pca-16
机器学习实战-神经网络-21