当前位置:网站首页>torch.optim.Adam() function usage
torch.optim.Adam() function usage
2022-07-30 16:54:00 【Mick..】
Adam: A method for stochastic optimization
Adam adaptively controls the size of the learning rate of each parameter through the first-order moment and second-order moment of the gradient.
Initialization of adam
def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), eps=1e-8,weight_decay=0, amsgrad=False):
Args:params (iterable): iterable of parameters to optimize or dicts definingparameter groupslr (float, optional): learning rate (default: 1e-3)betas (Tuple[float, float], optional): coefficients used for computingrunning averages of gradient and its square (default: (0.9, 0.999))eps (float, optional): term added to the denominator to improvenumerical stability (default: 1e-8)weight_decay (float, optional): weight decay (L2 penalty) (default: 0)amsgrad (boolean, optional): whether to use the AMSGrad variant of thisalgorithm from the paper `On the Convergence of Adam and Beyond`_(default: False)
@torch.no_grad()def step(self, closure=None):"""Performs a single optimization step.Args:closure (callable, optional): A closure that reevaluates the modeland returns the loss."""loss = Noneif closure is not None:with torch.enable_grad():loss = closure()for group in self.param_groups:params_with_grad = []grads = []exp_avgs = []exp_avg_sqs = []max_exp_avg_sqs = []state_steps = []beta1, beta2 = group['betas']for p in group['params']:if p.grad is not None:params_with_grad.append(p)if p.grad.is_sparse:raise RuntimeError('Adam does not support sparse gradients, please consider SparseAdam instead')grads.append(p.grad)state = self.state[p]# Lazy state initializationif len(state) == 0:state['step'] = 0# Exponential moving average of gradient valuesstate['exp_avg'] = torch.zeros_like(p, memory_format=torch.preserve_format)# Exponential moving average of squared gradient valuesstate['exp_avg_sq'] = torch.zeros_like(p, memory_format=torch.preserve_format)if group['amsgrad']:# Maintains max of all exp. moving avg. of sq. grad. valuesstate['max_exp_avg_sq'] = torch.zeros_like(p, memory_format=torch.preserve_format)exp_avgs.append(state['exp_avg'])exp_avg_sqs.append(state['exp_avg_sq'])if group['amsgrad']:max_exp_avg_sqs.append(state['max_exp_avg_sq'])# update the steps for each param group updatestate['step'] += 1# record the step after step updatestate_steps.append(state['step'])F.adam(params_with_grad,grads,exp_avgs,exp_avg_sqs,max_exp_avg_sqs,state_steps,amsgrad=group['amsgrad'],beta1=beta1,beta2=beta2,lr=group['lr'],weight_decay=group['weight_decay'],eps=group['eps'])return loss
边栏推荐
- Lotus 1.16.0 minimum snapshot export import
- huato 热更新环境搭建(DLL方式热更新C#代码)
- How to connect redis in node.js?
- You are a first-class loser, you become a first-class winner
- 【SOC FPGA】外设KEY点LED
- Security business revenue growth rate exceeds 70% 360 builds digital security leader
- 新人学习小熊派网络应用开发
- 如何注册域名、备案以及解析
- 华为云数据治理生产线DataArts,让“数据‘慧’说话”
- 绕开驱动层检测的无痕注入
猜你喜欢
HUAWEI CLOUD data governance production line DataArts, let "data 'wisdom' speak"
Navisworks切换语言
登录模块调试-软件调试入门
Security business revenue growth rate exceeds 70% 360 builds digital security leader
Public Key Retrieval is not allowed报错解决方案
3D激光SLAM:LeGO-LOAM论文解读---实验对比
Leetcode 119. Yang Hui's Triangle II
微信小程序picker滚动选择器使用详解
新人学习小熊派网络应用开发
云风:不加班、不炫技,把复杂的问题简单化
随机推荐
The service already exists! Solution
How to remove first character from php string
每日一题:两数之和
mysql进制安装与mysql密码破解
SwiftUI SQLite教程之带有历史的搜索栏List App (教程含完整代码)
李沐d2l(七)kaggle房价预测+数值稳定性+模型初始化和激活函数
数据的存储
vivo宣布延长产品保修期限 系统上线多种功能服务
onenote use
如何在 UE4 中用代码去控制角色移动
How to connect redis in node.js?
@Bean注解详解
Nervegrowold d2l (7) kaggle housing forecast model, numerical stability and the initialization and activation function
详解最实用的几种dll注入方式
数据库课程设计大作业大盘点【建议在校生收藏】
PHP留言反馈管理系统源码
Goland 开启文件保存自动进行格式化
LeetCode318:单词长度的最大乘积
MySQL索引常见面试题(2022版)
【HMS core】【FAQ】A collection of typical questions about Account, IAP, Location Kit and HarmonyOS 1