当前位置:网站首页>Pre training / transfer learning of models
Pre training / transfer learning of models
2022-07-01 01:22:00 【L_ bloomer】
Most of the time , The computational power and data we can use to train the model are very limited , It is very difficult to complete the training of a large neural network , Therefore, we hope to reuse the trained neural network as much as possible to save training and data resources . If we are performing a prediction task , Be able to find one who has performed similar tasks 、 And well trained large-scale architecture , Then we can use this big The shallower layers in the type a architecture help us build our own network . The technology of building a new architecture using a trained model is called “ The migration study ”(transfer learning), Also called pre training (pre-train). Pre training is when we train large models 、 One of the key technologies used to reduce data requirements and speed up training .
How to borrow the trained model architecture ?
The answer is Use the weight on the trained model . In transfer learning , What we want to reuse is a trained Architecture , Including its architecture itself and the weight on each layer . For example, we use the first three hidden layers of an existing architecture and their weights , And after these three hidden layers, add two more layers we customize , To build a new architecture . There are two options for training :
1) Use the weight on the migration layer as an initialization tool (Initialization tool): Take the weight of the migration layer as the initialization weight of the new architecture , On this basis, all layers are trained , Give the model a “ Minglu Road ”. In the most rigorous Literature , Borrow the original schema weight , The process of retraining all layers is called “ Preliminary training ”.
2) The migration layer is used as a fixed feature extraction tool (fixed feature extractor): We will migrate the weights on the layer “ Fix ” get up , These weights are not affected by processes such as back propagation , And let them serve as “ Fixed knowledge ” Used all the time . Relative , We added 、 Or the layer we choose can initialize parameters and participate in training like ordinary network architecture , And gradually find their own weight in each iteration . In the most rigorous Literature , This process is called “ The migration study ”.
What's the point of doing this ? For neural networks , The knowledge it has learned and the judgments it can make are stored in weights ( We save the model , It is also saving the weight of the model ), So keep the weight , Is to retain what the previous architecture has learned .
The use conditions of transfer learning cannot be ignored , When using transfer learning, we must pay attention to the following three points :
1) Previous tasks A And the tasks that need to be performed now B There are similarities 、 There is something to learn from each other . If the new task is based on tabular data 、 Or unique laboratory data , Transfer learning is rarely useful . contrary , The more similar the two tasks are , The more layers you can migrate , When tasks are highly similar , You can even change only the output layer of the original task , To complete the new architecture .
2) Mission A And task B Input image size 、 Try to have the same number of channels . In transfer learning, we 99% In all cases, the input layer of the existing architecture will be retained , Because the parameters existing on the input layer may be the most basic 、 The shallowest communication , Keeping the input layer will facilitate the learning of the model , Therefore, the input image of the model must be completely consistent with the input image of the video memory architecture ( Unless we give up the input layer ). alike , In transfer learning , We rarely mix architectures . The task attributes are similar 、 However, image data vary greatly in all aspects 、 When the input layer cannot be shared , Transfer learning may not have a good effect .
3) The migrated layer does not have to be completely locked . When the layer was first migrated , We usually lock all migration layers , And first train the model to see how the whole model performs . Then we will try to unlock one or two migration layers close to the output layer , Train again to see if the performance of the model improves , But in this case, we will use a small learning rate , Avoid iterating the weights on the migration layer beyond recognition in training . The more training data we have , The more layers we can unlock , The less training data you have , The fewer new layers we can add to the migration layer .
Let's see below. PyTorch How to achieve pre training in . First , When importing a classic model , We can use parameters that already exist in the model “pretrain” To help us load the weights on the pre training model .PyTorch The pre training of all models in is based on ImageNet Data set to complete , This and training parameter can be helpful in most actual photos , But for tabular data and MNIST This kind of data set is not very helpful . Look at the following code :

The newly generated layer defaults to requires_grad=True, Therefore, after locking the parameters in the model , Just cover the original layer , Or add a new layer after the original layer , The new layer can be trained by default . But the new layer will overwrite the trained parameters of the original layer , So we are generally wrong conv1 To cover .
It is not difficult to implement migration learning in code , But what we learn is to transfer and learn the shallowest knowledge . except pytorch carry For each architecture that can be migrated , We can still do that github Find a lot of other models that we can migrate , The weights of these models may be stored in github Upper url in , Or maybe we can go directly from github Download the model itself . To any pytorch For models that can be invoked , The trained weights are stored in the parameters state_dict in . We can directly from url Get weight , The middle note can also be transferred from the existing model to the right to migrate. .
边栏推荐
- Kongyiji's first question: how much do you know about service communication?
- ASCII、Unicode、GBK、UTF-8之间的关系
- K210工地安全帽
- 关于VCTK数据集
- Chapter 53 overall understanding of procedures from the perspective of business logic implementation
- None of the following candidates is applicable because of a receiver type mismatch
- 个人博客搭建与美化
- 友盟(软件异常实时监听的好帮手:Crash)接入教程(有点基础的小白最易学的教程)
- StrictMode分析Registion-StrictMode原理(4)
- [LeetCode] 爬楼梯【70】
猜你喜欢

Golang treasure house recommendation

基础知识之三——标准单元库

Gavin's insight on the transformer live broadcast course - rasa project's actual banking financial BOT Intelligent Business Dialogue robot system startup, language understanding, dialogue decision-mak

【学习笔记】倍增 + 二分

Basic knowledge of software and hardware -- diary (1)

ESP8266 RC522

探索互联网时代STEAM教育创新之路

Unhandled Exception: MissingPluginException(No implementation found for method launch on channel)

The longest selling mobile phone in China has been selling well since its launch, crushing iphone12

Kongyiji's first question: how much do you know about service communication?
随机推荐
Tcp/ip protocol stack, about TCP_ RST | TCP_ ACK correct attitude
What if the disk of datanode is full?
解析融合学科本质的创客教育路径
ESP8266 RC522
用recyclerReview展示Banner,很简单
Usage of C set
DLS-20型双位置继电器 220VDC
機器人編程的培訓學科類原理
Web interface testing of software testing
C语言一点点(未来可会增加)
JS方法大全的一个小文档
Chapter 53 overall understanding of procedures from the perspective of business logic implementation
Chromatic judgement bipartite graph
关于Unity一般的输入操作方式
Installing mongodb database in Windows Environment
友盟(软件异常实时监听的好帮手:Crash)接入教程(有点基础的小白最易学的教程)
个人博客搭建与美化
闭锁继电器YDB-100、100V
Q弹松软的大号吐司,带来更舒服的睡眠
孔乙己第一问之服务通信知多少?