当前位置:网站首页>Use of torch.optim optimizer in pytorch
Use of torch.optim optimizer in pytorch
2022-07-29 04:18:00 【ytusdc】
One 、 Basic usage of optimizer
- Create an optimizer instance
- loop :
- Clear the gradient
- Spread forward
- Calculation Loss
- Back propagation
- Update parameters
Example :
from torch import optim
input = .....
optimizer = optim.SGD(params=net.parameters(), lr=1) # Optimizer instance
optimizer.zero_grad() # Clear the gradient
output = net(input) # Forward propagation
loss = criterion(outputs, labels) # Calculation Loss
loss.backward() # Back propagation
optimizer.step() # Update parameters
Two Optimizer
PyTorch Provides torch.optim.lr_scheduler
To help users change their learning rate , The next will be from Optimizer
Starting with , Take a look at how this class works .
From what Optimizer Starting with , Because no matter Adam still SGD, Are inherited from this class . meanwhile ,scheduler Also for all Optimizer Service , So all the methods that need to be used will be defined in this base class , Just take a look at the properties of this class . give Doc The code in link .
The first is the initialization method def __init__(self, params, defaults)
, This method is params Parameters , It is the parameters of the network that we pass in when initializing the optimizer , Such as Alexnet.parameters()
, All parameters in the back will be merged into dict Parameter as the defaults.
to glance at Alexnet.parameters()
What's in it :
for alex in Alexnet.parameters():
print(alex.shape)
You can see , What is stored here is the parameters of the whole network .
There are two definitions optimizer Methods :
The first method :
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
In this initialization method , These parameters will be transformed into [{'params': Alexnet.parameters()}]
Such a length is 1 Of list. And then to this list Carry on the processing , add defaults Parameters in , If we use Alexnet As an example , It's like this below :
optimizer = torch.optim.Adam(Alexnet.parameters(), lr=0.001)
print([group.keys() for group in optimizer.param_groups])
# [dict_keys(['params', 'lr', 'betas', 'eps', 'weight_decay', 'amsgrad'])]
The second method : occasionally , Different learning rates need to be allocated to different levels during training , Now you can go through optimizer in param_groups To allocate
optimizer = optim.SGD([
{'params': model.base.parameters()},
{'params': model.classifier.parameters(), 'lr': 1e-3}
], lr=1e-2, momentum=0.9)
For the incoming itself is dict In the form of , So we will continue to process him , Add the upper and rear parameters , Let's go straight to the results :
optimizer = torch.optim.SGD([
{'params': Alexnet.features.parameters()},
{'params': Alexnet.classifier.parameters(), 'lr': 1e-3}
], lr=1e-2, momentum=0.9)
print([group.keys() for group in optimizer.param_groups])
# [dict_keys(['params', 'lr', 'momentum', 'dampening', 'weight_decay', 'nesterov']),
# dict_keys(['params', 'lr', 'momentum', 'dampening', 'weight_decay', 'nesterov'])]
This time list It becomes two elements , And the composition and use of each element Adam It's not the same , This is obviously , Because different optimizers need different parameters ~( About different layers lr Different settings are shown here on the official website link )
But the two are similar , Every element has params and lr, That's enough .
3、 ... and LRScheduler
All dynamic modifications lr Class , Are inherited from this class , So let's see what methods this class contains . Source code link .
In the initialization method def __init__(self, optimizer, last_epoch=-1)
, Contains two parameters , The first parameter is the one we mentioned above optimizer Any subclass of . The second parameter means the current execution to epoch. When we don't specify it , Although the default is -1, however init Will be called once in step And set to 0.
PyTorch 1.1.0 Later versions train first , And then again step()
.
When we call initialization , Will give optimizer Add a field , to glance at :
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.1)
print([group.keys() for group in optimizer.param_groups])
# [dict_keys(['params', 'lr', 'betas', 'eps', 'weight_decay',
# 'amsgrad', 'initial_lr'])]
Newly increased initial_lr
The field is the original lr.
stay def step(self, epoch=None)
In the method , Usually, we don't need to specify this parameter epoch, Because each call will increase 1. In this function, a method that needs to be overloaded will be called get_lr()
, Each call will extract the changed lr, Assign a value to optimizer.
In fact, I have always had a question here , Namely scheduler Of step and optimizer Of step What is the relationship , In fact, through the source code , See here , These two functions have nothing to do !scheduler Of step Will only modify lr, Both need to be implemented !
Take a look at two scheduler Of get_lr()
Compare the . Have a look first SetpLR:
def get_lr(self):
if (self.last_epoch == 0) or (self.last_epoch % self.step_size != 0):
return [group['lr'] for group in self.optimizer.param_groups]
return [group['lr'] * self.gamma
for group in self.optimizer.param_groups]
This will be when the integer multiple of the set step size is lr*gamma.
and ExponentialLR Will multiply at the end of each round gamma The operation of , This reduction is really exponential .
def get_lr(self):
if self.last_epoch == 0:
return self.base_lrs
return [group['lr'] * self.gamma
for group in self.optimizer.param_groups]
Demo
scheduler = StepLR(optimizer, step_size=30, gamma=0.1)
train_loader = Data.DataLoader(
dataset=train_dataset, batch_size=BATCH_SIZE, shuffle=True, pin_memory=True)
for epoch in range(100):
for X, y in train_loader:
...
optimizer.zero_grad()
loss.backward()
optimizer.step()
scheduler.step()
Four Adjust the learning rate dynamically
Pytorch in torch.optim.lr_scheduler Detailed explanation and example reference of the method of dynamically adjusting learning rate : Pytorch Detailed explanation and examples of the method of dynamically adjusting the learning rate _ When the wind and snow return at night o The blog of -CSDN Blog _ Dynamic learning rate
There is also a possibility of using less :
The code is as follows : each 20 individual epoch The learning rate is adjusted to the previous 10%
optimizer = optim.SGD(gan.parameters(),
lr=0.1,
momentum=0.9,
weight_decay=0.0005)
lr = optimizer.param_groups[0]['lr'] * (0.1 ** (epoch // 20))
for param_group in optimizer.param_groups:
param_group['lr'] = lr
print(optimizer.param_groups[0]['lr'])
边栏推荐
- C语言:结构体简单语法总结
- Opengauss pre check installation
- Do you have a boss to help me check whether the parameter configuration of the Flink SQL connection Kafka authentication Kerberos is wrong
- 12.优先级队列和惰性队列
- UnicodeDecodeError: ‘ascii‘ codec can‘t decode byte 0x90 in position 614: ordinal not in range(128)
- 12. Priority queue and inert queue
- Mmdetection preliminary use
- 10.回退消息
- SVG--loading动画
- 不会就坚持69天吧 合并区间
猜你喜欢
Problems encountered in vscode connection SSH
Implementation of jump connection of RESNET (pytorch)
全屋WiFi方案:Mesh路由器组网和AC+AP
UnicodeDecodeError: ‘ascii‘ codec can‘t decode byte 0x90 in position 614: ordinal not in range(128)
为什么opengauss启动的时候这么多的unknown?
Object detection: object_ Detection API +ssd target detection model
9.延迟队列
Deep learning training strategy -- warming up the learning rate
Beginner: array & String
不会就坚持64天吧 查找插入位置
随机推荐
11.备份交换机
Machine vision Series 1: Visual Studio 2019 dynamic link library DLL establishment
Not for 63 days. The biggest XOR
从淘宝,天猫,1688,微店,京东,苏宁,淘特等其他平台一键复制商品到拼多多平台(批量上传宝贝详情接口教程)
不会就坚持71天吧 链表排序
The pit I walked through: the first ad Sketchpad
10. Fallback message
Kotlin's list, map, set and other collection classes do not specify types
openFeign异步调用问题
编译与链接
Installation and use of stm32cubemx (5.3.0)
Problems encountered in vscode connection SSH
你真的会写Restful API吗?
When array is used as a function parameter, it is better to use the array size as a function parameter
"Weilai Cup" 2022 Niuke summer multi school training camp 1 J serval and essay (heuristic merger)
No, just stick to it for 64 days. Find the insertion location
[kvm] create virtual machine from kickstart file
Some problems about pointers
Introduction and examples of parameters in Jenkins parametric construction
Whole house WiFi solution: mesh router networking and ac+ap