当前位置:网站首页>Use of torch.optim optimizer in pytorch
Use of torch.optim optimizer in pytorch
2022-07-29 04:18:00 【ytusdc】
One 、 Basic usage of optimizer
- Create an optimizer instance
- loop :
- Clear the gradient
- Spread forward
- Calculation Loss
- Back propagation
- Update parameters
Example :
from torch import optim
input = .....
optimizer = optim.SGD(params=net.parameters(), lr=1) # Optimizer instance
optimizer.zero_grad() # Clear the gradient
output = net(input) # Forward propagation
loss = criterion(outputs, labels) # Calculation Loss
loss.backward() # Back propagation
optimizer.step() # Update parameters Two Optimizer
PyTorch Provides torch.optim.lr_scheduler To help users change their learning rate , The next will be from Optimizer Starting with , Take a look at how this class works .
From what Optimizer Starting with , Because no matter Adam still SGD, Are inherited from this class . meanwhile ,scheduler Also for all Optimizer Service , So all the methods that need to be used will be defined in this base class , Just take a look at the properties of this class . give Doc The code in link .
The first is the initialization method def __init__(self, params, defaults), This method is params Parameters , It is the parameters of the network that we pass in when initializing the optimizer , Such as Alexnet.parameters(), All parameters in the back will be merged into dict Parameter as the defaults.
to glance at Alexnet.parameters() What's in it :
for alex in Alexnet.parameters():
print(alex.shape)
You can see , What is stored here is the parameters of the whole network .
There are two definitions optimizer Methods :
The first method :
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9) In this initialization method , These parameters will be transformed into [{'params': Alexnet.parameters()}]
Such a length is 1 Of list. And then to this list Carry on the processing , add defaults Parameters in , If we use Alexnet As an example , It's like this below :
optimizer = torch.optim.Adam(Alexnet.parameters(), lr=0.001)
print([group.keys() for group in optimizer.param_groups])
# [dict_keys(['params', 'lr', 'betas', 'eps', 'weight_decay', 'amsgrad'])]The second method : occasionally , Different learning rates need to be allocated to different levels during training , Now you can go through optimizer in param_groups To allocate
optimizer = optim.SGD([
{'params': model.base.parameters()},
{'params': model.classifier.parameters(), 'lr': 1e-3}
], lr=1e-2, momentum=0.9)For the incoming itself is dict In the form of , So we will continue to process him , Add the upper and rear parameters , Let's go straight to the results :
optimizer = torch.optim.SGD([
{'params': Alexnet.features.parameters()},
{'params': Alexnet.classifier.parameters(), 'lr': 1e-3}
], lr=1e-2, momentum=0.9)
print([group.keys() for group in optimizer.param_groups])
# [dict_keys(['params', 'lr', 'momentum', 'dampening', 'weight_decay', 'nesterov']),
# dict_keys(['params', 'lr', 'momentum', 'dampening', 'weight_decay', 'nesterov'])]
This time list It becomes two elements , And the composition and use of each element Adam It's not the same , This is obviously , Because different optimizers need different parameters ~( About different layers lr Different settings are shown here on the official website link )
But the two are similar , Every element has params and lr, That's enough .
3、 ... and LRScheduler
All dynamic modifications lr Class , Are inherited from this class , So let's see what methods this class contains . Source code link .
In the initialization method def __init__(self, optimizer, last_epoch=-1), Contains two parameters , The first parameter is the one we mentioned above optimizer Any subclass of . The second parameter means the current execution to epoch. When we don't specify it , Although the default is -1, however init Will be called once in step And set to 0.
PyTorch 1.1.0 Later versions train first , And then again step().
When we call initialization , Will give optimizer Add a field , to glance at :
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.1)
print([group.keys() for group in optimizer.param_groups])
# [dict_keys(['params', 'lr', 'betas', 'eps', 'weight_decay',
# 'amsgrad', 'initial_lr'])] Newly increased initial_lr The field is the original lr.
stay def step(self, epoch=None) In the method , Usually, we don't need to specify this parameter epoch, Because each call will increase 1. In this function, a method that needs to be overloaded will be called get_lr(), Each call will extract the changed lr, Assign a value to optimizer.
In fact, I have always had a question here , Namely scheduler Of step and optimizer Of step What is the relationship , In fact, through the source code , See here , These two functions have nothing to do !scheduler Of step Will only modify lr, Both need to be implemented !
Take a look at two scheduler Of get_lr() Compare the . Have a look first SetpLR:
def get_lr(self):
if (self.last_epoch == 0) or (self.last_epoch % self.step_size != 0):
return [group['lr'] for group in self.optimizer.param_groups]
return [group['lr'] * self.gamma
for group in self.optimizer.param_groups]
This will be when the integer multiple of the set step size is lr*gamma.
and ExponentialLR Will multiply at the end of each round gamma The operation of , This reduction is really exponential .
def get_lr(self):
if self.last_epoch == 0:
return self.base_lrs
return [group['lr'] * self.gamma
for group in self.optimizer.param_groups]Demo
scheduler = StepLR(optimizer, step_size=30, gamma=0.1)
train_loader = Data.DataLoader(
dataset=train_dataset, batch_size=BATCH_SIZE, shuffle=True, pin_memory=True)
for epoch in range(100):
for X, y in train_loader:
...
optimizer.zero_grad()
loss.backward()
optimizer.step()
scheduler.step()
Four Adjust the learning rate dynamically
Pytorch in torch.optim.lr_scheduler Detailed explanation and example reference of the method of dynamically adjusting learning rate : Pytorch Detailed explanation and examples of the method of dynamically adjusting the learning rate _ When the wind and snow return at night o The blog of -CSDN Blog _ Dynamic learning rate
There is also a possibility of using less :
The code is as follows : each 20 individual epoch The learning rate is adjusted to the previous 10%
optimizer = optim.SGD(gan.parameters(),
lr=0.1,
momentum=0.9,
weight_decay=0.0005)
lr = optimizer.param_groups[0]['lr'] * (0.1 ** (epoch // 20))
for param_group in optimizer.param_groups:
param_group['lr'] = lr
print(optimizer.param_groups[0]['lr'])
边栏推荐
- SQL time fuzzy query datediff() function
- Won't you just stick to 69 days? Merge range
- 索引的最左前缀原理
- MySQL gets the maximum value record by field grouping
- Realize the effect of univariate quadratic equation through JS. Enter the coefficients of a, B and C to calculate the values of X1 and x2
- Multi rotor six axis hardware selection
- How to execute insert into select from job in SQL client
- %s. %c, character constant, string constant, const char*, pointer array, string array summary
- 不会就坚持66天吧 权重生成随机数
- 10.回退消息
猜你喜欢
随机推荐
The difference between dynamic, VaR and object in fluent
C语言:结构体简单语法总结
C语言:typedef知识点总结
为什么opengauss启动的时候这么多的unknown?
Implementation of jump connection of RESNET (pytorch)
View partition table format
不会就坚持63天吧 最大的异或
Object detection: object_ Detection API +ssd target detection model
Labelme cannot open the picture
The return value of the function is the attention of the pointer, the local variables inside the static limit sub function, and how the pointer to the array represents the array elements
10.回退消息
Mmdetection preliminary use
15.federation
数据集成这个地方的过滤条件该咋写,用的啥语法?sql语法处理bizdate可以不
RMAN do not mark expired backups
Jenkins 参数化构建中 各参数介绍与示例
When array is used as a function parameter, it is better to use the array size as a function parameter
[hands on deep learning] environment configuration (detailed records, starting from the installation of VMware virtual machine)
[Openstack] keystone,nova
UnicodeDecodeError: ‘ascii‘ codec can‘t decode byte 0x90 in position 614: ordinal not in range(128)







