当前位置:网站首页>Pytorch optimizer settings
Pytorch optimizer settings
2022-07-28 02:28:00 【Mick..】
In the process of deep learning and training, the learning rate is very important . Too low learning rate will lead to too slow learning , Too high learning rate will make it difficult to converge . Usually , The initial learning rate will be relatively large , Later, the learning rate gradually decreased .
Usually, the model optimizer is set
First, define a two-layer full connection layer model
import torch
from torch import nn
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.layer1 = nn.Linear(10, 2)
self.layer2 = nn.Linear(2, 10)
def forward(self, input):
return self.layer2(self.layer1(input))Execution steps of neural network . First, the neural network goes forward , This is the neural network framework, which will build the calculation diagram ( Here, the operation and the corresponding tensor involved in the calculation will be saved , Because this information is needed when calculating the gradient according to the calculation diagram ). Then error back propagation ,loss.backward() , At this time, the gradient information will be calculated . Finally, according to gradient information , Update parameters .
loss.backward()
optimizer.step()
optimizer.zero_grad()optimizer.zero_grad() Is to clear the gradient of this round , Prevent affecting the next round of parameter updates . I have asked the interview questions here : When not to use this step to clear .
model = Net()
# Only pass in the parameters of the desired training layer . Other parameters that are not passed in do not participate in the update
optimizer_Adam = torch.optim.Adam(model.parameters(), lr=0.1)model.parameters() All parameters of the model will be returned
Only some parameters of the training model
That is to say, only the parameters of the model to be optimized are imported , The parameter passed in does not participate in the update .
model = Net()
# Only pass in the parameters to be optimized
optimizer_Adam = torch.optim.Adam(model.layer1.parameters(), lr=0.1) Different parts set different learning rates
params_dict = [{'params': model.layer1.parameters(), 'lr': 0.01},
{'params': model.layer2.parameters(), 'lr': 0.001}]
optimizer = torch.optim.Adam(params_dict)Dynamically modify the learning rate
Optimizer's param_group attribute
-param_groups
-0(dict) # The first set of parameters
params: # Maintain the parameters to be updated
lr: # The learning rate of this group of parameters
betas:
eps: # The minimum learning rate of this group of parameters
weight_decay: # The weight attenuation coefficient of this group of parameters
amsgrad:
-1(dict) # Second set of parameters
-2(dict) # The third set of parameters parm_group It's a list , Each of these elements is a dictionary
model = Net() # Generation network
optimizer = torch.optim.Adam(model.parameters(), lr=0.1) # Build optimizer
for epoch in range(100): # Suppose iteration 100 individual epoch
if epoch % 5 == 0: # Every iteration 5 Time , Update the learning rate once
for params in optimizer.param_groups:
# Traverse Optimizer Each set of parameters in , The learning rate of this group of parameters * 0.9
params['lr'] *= 0.9
边栏推荐
- 软考 --- 数据库(2)关系模型
- Learn this trick and never be afraid to let the code collapse by mistake
- Starfish Os打造的元宇宙生态,跟MetaBell的合作只是开始
- Necessary knowledge points of the original group
- Promise from introduction to mastery (Chapter 1 Introduction and basic use of promise)
- Skywalking distributed system application performance monitoring tool - medium
- Leetcode hot topic Hot 100 - > 3. longest substring without repeated characters
- MySQL high availability and master-slave synchronization
- [in depth study of 4g/5g/6g topic -42]: urllc-14 - in depth interpretation of 3GPP urllc related protocols, specifications and technical principles -8-low delay technology-2-slot based scheduling and
- Flex development web page instance web side
猜你喜欢

Wechat campus bathroom reservation applet graduation design finished product (2) applet function

小程序毕设作品之微信校园浴室预约小程序毕业设计成品(2)小程序功能

Unity 保存图片到相册以及权限管理

OBS键盘插件自定义diy

微信小程序实现动态横向步骤条的两种方式

ArcGIS: loading historical remote sensing images

借助Elephant Swap打造的ePLATO,背后的高溢价解析

结构伪类选择器—查找单个—查找多个—nth-of-type和伪元素

Read Plato & nbsp; Eplato of farm and the reasons for its high premium

Flex布局—固定定位+流式布局—主轴对齐—侧轴对齐—伸缩比
随机推荐
[Yugong series] July 2022 go teaching course 019 - for circular structure
Skywalking distributed system application performance monitoring tool - medium
Alipay applet authorization / obtaining user information
Important arrangements - the follow-up live broadcast of dx12 engine development course will be held at station B
Flex布局—固定定位+流式布局—主轴对齐—侧轴对齐—伸缩比
软工必备知识点
MySQL数据库InnoDB存储引擎中的锁机制(荣耀典藏版)
[Star Project] small hat aircraft War (VI)
With elephant & nbsp; Eplato created by swap, analysis of the high premium behind it
使用BigDecimal类型应该避免哪些问题?(荣耀典藏版)
小程序毕设作品之微信校园浴室预约小程序毕业设计成品(2)小程序功能
Flume (5 demos easy to get started)
mysql创建存储过程---------[HY000][1418] This function has none of DETERMINISTIC, NO SQL
LeetCode 热题 HOT 100 -> 2.两数相加
C#引入WINAPI传递中文字符串参数字符集问题
软考 --- 数据库(2)关系模型
支付宝小程序授权/获取用户信息
Mysql Explain 详解(荣耀典藏版)
Software test interview questions: common post data submission methods
Go learning 01