当前位置：网站首页>Use of torch.optim optimizer in pytorch

Use of torch.optim optimizer in pytorch

2022-07-29 04:18:00 【ytusdc】

One 、 Basic usage of optimizer

Create an optimizer instance
loop ：
1. Clear the gradient
2. Spread forward
3. Calculation Loss
4. Back propagation
5. Update parameters

Example ：

from torch import optim
input = .....
optimizer = optim.SGD(params=net.parameters(), lr=1) #  Optimizer instance 
optimizer.zero_grad()  #  Clear the gradient 
output = net(input)    #  Forward propagation 
loss = criterion(outputs, labels) # Calculation Loss
loss.backward()        #  Back propagation 
optimizer.step()       #  Update parameters

Two Optimizer

PyTorch Provides torch.optim.lr_scheduler To help users change their learning rate , The next will be from Optimizer Starting with , Take a look at how this class works .

From what Optimizer Starting with , Because no matter Adam still SGD, Are inherited from this class . meanwhile ,scheduler Also for all Optimizer Service , So all the methods that need to be used will be defined in this base class , Just take a look at the properties of this class . give Doc The code in link .

The first is the initialization method def __init__(self, params, defaults), This method is params Parameters , It is the parameters of the network that we pass in when initializing the optimizer , Such as Alexnet.parameters(), All parameters in the back will be merged into dict Parameter as the defaults.
to glance at Alexnet.parameters() What's in it ：

for alex in Alexnet.parameters():
    print(alex.shape)

You can see , What is stored here is the parameters of the whole network .
There are two definitions optimizer Methods ：

The first method ：

optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

 In this initialization method , These parameters will be transformed into  [{'params': Alexnet.parameters()}]
 Such a length is 1 Of list. And then to this list Carry on the processing , add defaults Parameters in , If we use Alexnet As an example , It's like this below ：

optimizer = torch.optim.Adam(Alexnet.parameters(), lr=0.001)
print([group.keys() for group in optimizer.param_groups])
# [dict_keys(['params', 'lr', 'betas', 'eps', 'weight_decay', 'amsgrad'])]

The second method ： occasionally , Different learning rates need to be allocated to different levels during training , Now you can go through optimizer in param_groups To allocate

optimizer = optim.SGD([
    {'params': model.base.parameters()},
    {'params': model.classifier.parameters(), 'lr': 1e-3}
], lr=1e-2, momentum=0.9)

For the incoming itself is dict In the form of , So we will continue to process him , Add the upper and rear parameters , Let's go straight to the results ：

optimizer = torch.optim.SGD([
    {'params': Alexnet.features.parameters()},
    {'params': Alexnet.classifier.parameters(), 'lr': 1e-3}
], lr=1e-2, momentum=0.9)
print([group.keys() for group in optimizer.param_groups])
# [dict_keys(['params', 'lr', 'momentum', 'dampening', 'weight_decay', 'nesterov']),
# dict_keys(['params', 'lr', 'momentum', 'dampening', 'weight_decay', 'nesterov'])]

This time list It becomes two elements , And the composition and use of each element Adam It's not the same , This is obviously , Because different optimizers need different parameters ~( About different layers lr Different settings are shown here on the official website link )

But the two are similar , Every element has params and lr, That's enough .

3、 ... and LRScheduler

All dynamic modifications lr Class , Are inherited from this class , So let's see what methods this class contains . Source code link .

In the initialization method def __init__(self, optimizer, last_epoch=-1), Contains two parameters , The first parameter is the one we mentioned above optimizer Any subclass of . The second parameter means the current execution to epoch. When we don't specify it , Although the default is -1, however init Will be called once in step And set to 0.

PyTorch 1.1.0 Later versions train first , And then again step().

When we call initialization , Will give optimizer Add a field , to glance at ：

scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.1)
print([group.keys() for group in optimizer.param_groups])
# [dict_keys(['params', 'lr', 'betas', 'eps', 'weight_decay', 
# 'amsgrad', 'initial_lr'])]

Newly increased initial_lr The field is the original lr.

stay def step(self, epoch=None) In the method , Usually, we don't need to specify this parameter epoch, Because each call will increase 1. In this function, a method that needs to be overloaded will be called get_lr(), Each call will extract the changed lr, Assign a value to optimizer.

In fact, I have always had a question here , Namely scheduler Of step and optimizer Of step What is the relationship , In fact, through the source code , See here , These two functions have nothing to do ！scheduler Of step Will only modify lr, Both need to be implemented ！

Take a look at two scheduler Of get_lr() Compare the . Have a look first SetpLR：

def get_lr(self):
    if (self.last_epoch == 0) or (self.last_epoch % self.step_size != 0):
        return [group['lr'] for group in self.optimizer.param_groups]
    return [group['lr'] * self.gamma
            for group in self.optimizer.param_groups]

This will be when the integer multiple of the set step size is lr*gamma.
and ExponentialLR Will multiply at the end of each round gamma The operation of , This reduction is really exponential .

def get_lr(self):
    if self.last_epoch == 0:
        return self.base_lrs
    return [group['lr'] * self.gamma
            for group in self.optimizer.param_groups]

Demo

scheduler = StepLR(optimizer, step_size=30, gamma=0.1)
train_loader = Data.DataLoader(
        dataset=train_dataset, batch_size=BATCH_SIZE, shuffle=True, pin_memory=True)
for epoch in range(100):
    for X, y in train_loader:
        ...
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    scheduler.step()

Four Adjust the learning rate dynamically

Pytorch in torch.optim.lr_scheduler Detailed explanation and example reference of the method of dynamically adjusting learning rate ： Pytorch Detailed explanation and examples of the method of dynamically adjusting the learning rate _ When the wind and snow return at night o The blog of -CSDN Blog _ Dynamic learning rate

PyTorch torch.optim.lr_scheduler Learning rate - LambdaLR;StepLR;MultiStepLR;ExponentialLR_zisuina_2 The blog of -CSDN Blog _lambdalr

There is also a possibility of using less ：

The code is as follows ： each 20 individual epoch The learning rate is adjusted to the previous 10%

optimizer = optim.SGD(gan.parameters(), 
                                  lr=0.1,
                                  momentum=0.9,
                                  weight_decay=0.0005)

lr = optimizer.param_groups[0]['lr'] * (0.1 ** (epoch // 20))
for param_group in optimizer.param_groups:
    param_group['lr'] = lr
print(optimizer.param_groups[0]['lr'])

原网站

版权声明
本文为[ytusdc]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/196/202207130552016495.html