当前位置:网站首页>【torch】|torch. nn. utils. clip_ grad_ norm_
【torch】|torch. nn. utils. clip_ grad_ norm_
2022-07-06 05:18:00 【rrr2】
The greater the gradient ,total_norm The bigger the value is. , Leading to clip_coef The smaller the value of , Eventually, it will also lead to the more severe clipping of the gradient , Very reasonable.
norm_type Take... No matter how much , about total_norm The impact is not too great (1 and 2 The gap is a little larger ), So you can take the default value directly 2
norm_type The bigger it is ,total_norm The smaller it is ( The conclusions observed in the experiment , Math is not good , It will not prove that , So this article is not necessarily right )
...
loss = crit(...)
optimizer.zero_grad()
loss.backward()
torch.nn.utils.clip_grad_norm_(parameters=model.parameters(), max_norm=10, norm_type=2)
optimizer.step()
...
clip_coef The smaller it is , The more severe the cutting of gradient , namely , The more you reduce the value of the gradient
max_norm The smaller it is ,clip_coef The smaller it is , therefore ,max_norm The bigger it is , The softer the solution of gradient explosion ,max_norm The smaller it is , The harder to solve the gradient explosion .max_norm You can take decimals
ref
https://blog.csdn.net/Mikeyboi/article/details/119522689
边栏推荐
- Unity gets the width and height of Sprite
- 2022半年总结
- [leetcode daily question] number of enclaves
- UCF (summer team competition II)
- 你需要知道的 TCP 三次握手
- C AES encrypts strings
- Check the useful photo lossless magnification software on Apple computer
- Codeforces Round #804 (Div. 2) Editorial(A-B)
- 组播和广播的知识点梳理
- Acwing week 58
猜你喜欢
从0到1建设智能灰度数据体系:以vivo游戏中心为例
指針經典筆試題
趋势前沿 | 达摩院语音 AI 最新技术大全
GAMES202-WebGL中shader的编译和连接(了解向)
RT thread analysis log system RT_ Kprintf analysis
The ECU of 21 Audi q5l 45tfsi brushes is upgraded to master special adjustment, and the horsepower is safely and stably increased to 305 horsepower
[lgr-109] Luogu may race II & windy round 6
yolov5 tensorrt加速
RT thread analysis - object container implementation and function
Imperial cms7.5 imitation "D9 download station" software application download website source code
随机推荐
Sorting out the knowledge points of multicast and broadcasting
图论的扩展
Building intelligent gray-scale data system from 0 to 1: Taking vivo game center as an example
Mysql高级篇学习总结9:创建索引、删除索引、降序索引、隐藏索引
C AES encrypts strings
[effective Objective-C] - memory management
剑指 Offer II 039. 直方图最大矩形面积
[leetcode16] the sum of the nearest three numbers (double pointer)
Imperial cms7.5 imitation "D9 download station" software application download website source code
Extension of graph theory
HAC集群修改管理员用户密码
Compilation and connection of shader in games202 webgl (learn from)
Pix2pix: image to image conversion using conditional countermeasure networks
The ECU of 21 Audi q5l 45tfsi brushes is upgraded to master special adjustment, and the horsepower is safely and stably increased to 305 horsepower
Select knowledge points of structure
Knowledge points of circular structure
Yolov5 tensorrt acceleration
ByteDance program yuan teaches you how to brush algorithm questions: I'm not afraid of the interviewer tearing the code
Yyds dry inventory SSH Remote Connection introduction
Pickle and savez_ Compressed compressed volume comparison