当前位置：网站首页>torch.optim.Adam() function usage

torch.optim.Adam() function usage

2022-07-30 16:54:00 【Mick..】

Adam: A method for stochastic optimization

Adam adaptively controls the size of the learning rate of each parameter through the first-order moment and second-order moment of the gradient.

Initialization of adam

 def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), eps=1e-8,weight_decay=0, amsgrad=False):

Args:params (iterable): iterable of parameters to optimize or dicts definingparameter groupslr (float, optional): learning rate (default: 1e-3)betas (Tuple[float, float], optional): coefficients used for computingrunning averages of gradient and its square (default: (0.9, 0.999))eps (float, optional): term added to the denominator to improvenumerical stability (default: 1e-8)weight_decay (float, optional): weight decay (L2 penalty) (default: 0)amsgrad (boolean, optional): whether to use the AMSGrad variant of thisalgorithm from the paper `On the Convergence of Adam and Beyond`_(default: False)

@torch.no_grad()def step(self, closure=None):"""Performs a single optimization step.Args:closure (callable, optional): A closure that reevaluates the modeland returns the loss."""loss = Noneif closure is not None:with torch.enable_grad():loss = closure()for group in self.param_groups:params_with_grad = []grads = []exp_avgs = []exp_avg_sqs = []max_exp_avg_sqs = []state_steps = []beta1, beta2 = group['betas']for p in group['params']:if p.grad is not None:params_with_grad.append(p)if p.grad.is_sparse:raise RuntimeError('Adam does not support sparse gradients, please consider SparseAdam instead')grads.append(p.grad)state = self.state[p]# Lazy state initializationif len(state) == 0:state['step'] = 0# Exponential moving average of gradient valuesstate['exp_avg'] = torch.zeros_like(p, memory_format=torch.preserve_format)# Exponential moving average of squared gradient valuesstate['exp_avg_sq'] = torch.zeros_like(p, memory_format=torch.preserve_format)if group['amsgrad']:# Maintains max of all exp. moving avg. of sq. grad. valuesstate['max_exp_avg_sq'] = torch.zeros_like(p, memory_format=torch.preserve_format)exp_avgs.append(state['exp_avg'])exp_avg_sqs.append(state['exp_avg_sq'])if group['amsgrad']:max_exp_avg_sqs.append(state['max_exp_avg_sq'])# update the steps for each param group updatestate['step'] += 1# record the step after step updatestate_steps.append(state['step'])F.adam(params_with_grad,grads,exp_avgs,exp_avg_sqs,max_exp_avg_sqs,state_steps,amsgrad=group['amsgrad'],beta1=beta1,beta2=beta2,lr=group['lr'],weight_decay=group['weight_decay'],eps=group['eps'])return loss

原网站

版权声明
本文为[Mick..]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/211/202207301641354935.html

当前位置：网站首页>torch.optim.Adam() function usage

torch.optim.Adam() function usage

Adam: A method for stochastic optimization

边栏推荐

猜你喜欢

随机推荐