当前位置：网站首页>Pre training / transfer learning of models

Pre training / transfer learning of models

2022-07-01 01:22:00 【L_ bloomer】

Most of the time , The computational power and data we can use to train the model are very limited , It is very difficult to complete the training of a large neural network , Therefore, we hope to reuse the trained neural network as much as possible to save training and data resources . If we are performing a prediction task , Be able to find one who has performed similar tasks 、 And well trained large-scale architecture , Then we can use this big The shallower layers in the type a architecture help us build our own network . The technology of building a new architecture using a trained model is called “ The migration study ”（transfer learning）, Also called pre training （pre-train）. Pre training is when we train large models 、 One of the key technologies used to reduce data requirements and speed up training .

How to borrow the trained model architecture ？

The answer is Use the weight on the trained model . In transfer learning , What we want to reuse is a trained Architecture , Including its architecture itself and the weight on each layer . For example, we use the first three hidden layers of an existing architecture and their weights , And after these three hidden layers, add two more layers we customize , To build a new architecture . There are two options for training ：

1） Use the weight on the migration layer as an initialization tool （Initialization tool）： Take the weight of the migration layer as the initialization weight of the new architecture , On this basis, all layers are trained , Give the model a “ Minglu Road ”. In the most rigorous Literature , Borrow the original schema weight , The process of retraining all layers is called “ Preliminary training ”.

2） The migration layer is used as a fixed feature extraction tool （fixed feature extractor）： We will migrate the weights on the layer “ Fix ” get up , These weights are not affected by processes such as back propagation , And let them serve as “ Fixed knowledge ” Used all the time . Relative , We added 、 Or the layer we choose can initialize parameters and participate in training like ordinary network architecture , And gradually find their own weight in each iteration . In the most rigorous Literature , This process is called “ The migration study ”.

What's the point of doing this ？ For neural networks , The knowledge it has learned and the judgments it can make are stored in weights （ We save the model , It is also saving the weight of the model ）, So keep the weight , Is to retain what the previous architecture has learned .

The use conditions of transfer learning cannot be ignored , When using transfer learning, we must pay attention to the following three points ：

1） Previous tasks A And the tasks that need to be performed now B There are similarities 、 There is something to learn from each other . If the new task is based on tabular data 、 Or unique laboratory data , Transfer learning is rarely useful . contrary , The more similar the two tasks are , The more layers you can migrate , When tasks are highly similar , You can even change only the output layer of the original task , To complete the new architecture .

2） Mission A And task B Input image size 、 Try to have the same number of channels . In transfer learning, we 99% In all cases, the input layer of the existing architecture will be retained , Because the parameters existing on the input layer may be the most basic 、 The shallowest communication , Keeping the input layer will facilitate the learning of the model , Therefore, the input image of the model must be completely consistent with the input image of the video memory architecture （ Unless we give up the input layer ）. alike , In transfer learning , We rarely mix architectures . The task attributes are similar 、 However, image data vary greatly in all aspects 、 When the input layer cannot be shared , Transfer learning may not have a good effect .

3） The migrated layer does not have to be completely locked . When the layer was first migrated , We usually lock all migration layers , And first train the model to see how the whole model performs . Then we will try to unlock one or two migration layers close to the output layer , Train again to see if the performance of the model improves , But in this case, we will use a small learning rate , Avoid iterating the weights on the migration layer beyond recognition in training . The more training data we have , The more layers we can unlock , The less training data you have , The fewer new layers we can add to the migration layer .

Let's see below. PyTorch How to achieve pre training in . First , When importing a classic model , We can use parameters that already exist in the model “pretrain” To help us load the weights on the pre training model .PyTorch The pre training of all models in is based on ImageNet Data set to complete , This and training parameter can be helpful in most actual photos , But for tabular data and MNIST This kind of data set is not very helpful . Look at the following code ：

The newly generated layer defaults to requires_grad=True, Therefore, after locking the parameters in the model , Just cover the original layer , Or add a new layer after the original layer , The new layer can be trained by default . But the new layer will overwrite the trained parameters of the original layer , So we are generally wrong conv1 To cover .

It is not difficult to implement migration learning in code , But what we learn is to transfer and learn the shallowest knowledge . except pytorch carry For each architecture that can be migrated , We can still do that github Find a lot of other models that we can migrate , The weights of these models may be stored in github Upper url in , Or maybe we can go directly from github Download the model itself . To any pytorch For models that can be invoked , The trained weights are stored in the parameters state_dict in . We can directly from url Get weight , The middle note can also be transferred from the existing model to the right to migrate. .

原网站

版权声明
本文为[L_ bloomer]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202160355052010.html

当前位置：网站首页>Pre training / transfer learning of models

Pre training / transfer learning of models

边栏推荐

猜你喜欢

随机推荐