当前位置：网站首页>【torch】|torch. nn. utils. clip_ grad_ norm_

【torch】|torch. nn. utils. clip_ grad_ norm_

2022-07-06 05:18:00 【rrr2】

Insert picture description here
The greater the gradient ,total_norm The bigger the value is. , Leading to clip_coef The smaller the value of , Eventually, it will also lead to the more severe clipping of the gradient , Very reasonable.
norm_type Take... No matter how much , about total_norm The impact is not too great （1 and 2 The gap is a little larger ）, So you can take the default value directly 2
norm_type The bigger it is ,total_norm The smaller it is （ The conclusions observed in the experiment , Math is not good , It will not prove that , So this article is not necessarily right ）

...
loss = crit(...)

optimizer.zero_grad()
loss.backward()
torch.nn.utils.clip_grad_norm_(parameters=model.parameters(), max_norm=10, norm_type=2)
optimizer.step()
...

clip_coef The smaller it is , The more severe the cutting of gradient , namely , The more you reduce the value of the gradient
max_norm The smaller it is ,clip_coef The smaller it is , therefore ,max_norm The bigger it is , The softer the solution of gradient explosion ,max_norm The smaller it is , The harder to solve the gradient explosion .max_norm You can take decimals

ref
https://blog.csdn.net/Mikeyboi/article/details/119522689

原网站

版权声明
本文为[rrr2]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/187/202207060515167743.html

当前位置：网站首页>【torch】|torch. nn. utils. clip_ grad_ norm_

【torch】|torch. nn. utils. clip_ grad_ norm_

边栏推荐

猜你喜欢

随机推荐