当前位置：网站首页>Pytorch optimizer settings

Pytorch optimizer settings

2022-07-28 02:28:00 【Mick..】

In the process of deep learning and training, the learning rate is very important . Too low learning rate will lead to too slow learning , Too high learning rate will make it difficult to converge . Usually , The initial learning rate will be relatively large , Later, the learning rate gradually decreased .

Usually, the model optimizer is set

First, define a two-layer full connection layer model

import torch
from torch import nn
class Net(nn.Module):    
    def __init__(self):        
        super(Net, self).__init__()        
        self.layer1 = nn.Linear(10, 2)        
        self.layer2 = nn.Linear(2, 10)

    def forward(self, input):        
        return self.layer2(self.layer1(input))

Execution steps of neural network . First, the neural network goes forward , This is the neural network framework, which will build the calculation diagram （ Here, the operation and the corresponding tensor involved in the calculation will be saved , Because this information is needed when calculating the gradient according to the calculation diagram ）. Then error back propagation ,loss.backward() , At this time, the gradient information will be calculated . Finally, according to gradient information , Update parameters .

loss.backward()
optimizer.step()
optimizer.zero_grad()

optimizer.zero_grad() Is to clear the gradient of this round , Prevent affecting the next round of parameter updates . I have asked the interview questions here ： When not to use this step to clear .

model = Net()
#  Only pass in the parameters of the desired training layer . Other parameters that are not passed in do not participate in the update 
optimizer_Adam = torch.optim.Adam(model.parameters(), lr=0.1)

model.parameters() All parameters of the model will be returned

Only some parameters of the training model

That is to say, only the parameters of the model to be optimized are imported , The parameter passed in does not participate in the update .

model = Net()
#  Only pass in the parameters to be optimized 
optimizer_Adam = torch.optim.Adam(model.layer1.parameters(), lr=0.1)

Different parts set different learning rates

params_dict = [{'params': model.layer1.parameters(), 'lr': 0.01},              
             {'params': model.layer2.parameters(), 'lr': 0.001}]
optimizer = torch.optim.Adam(params_dict)

Dynamically modify the learning rate

Optimizer's param_group attribute

-param_groups    
    -0(dict)  #  The first set of parameters         
        params:  #  Maintain the parameters to be updated         
        lr:  #  The learning rate of this group of parameters         
        betas:        
        eps:  #  The minimum learning rate of this group of parameters         
        weight_decay:  #  The weight attenuation coefficient of this group of parameters         
        amsgrad:      
    -1(dict)  #  Second set of parameters     
    -2(dict)  #  The third set of parameters

parm_group It's a list , Each of these elements is a dictionary

model = Net()  #  Generation network 
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)  #  Build optimizer 

for epoch in range(100):  #  Suppose iteration 100 individual epoch    
    if epoch % 5 == 0:  #  Every iteration 5 Time , Update the learning rate once         
        for params in optimizer.param_groups:             
            #  Traverse Optimizer Each set of parameters in , The learning rate of this group of parameters  * 0.9            
            params['lr'] *= 0.9

原网站

版权声明
本文为[Mick..]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/209/202207280105563674.html