当前位置:网站首页>【torch】|torch.nn.utils.clip_grad_norm_
【torch】|torch.nn.utils.clip_grad_norm_
2022-07-06 05:15:00 【rrr2】
梯度越大,total_norm值越大,进而导致clip_coef的值越小,最终也会导致对梯度的裁剪越厉害,很合理
norm_type不管取多少,对于total_norm的影响不是太大(1和2的差距稍微大一点),所以可以直接取默认值2
norm_type越大,total_norm越小(实验观察到的结论,数学不好,不会证明,所以本条不一定对)
...
loss = crit(...)
optimizer.zero_grad()
loss.backward()
torch.nn.utils.clip_grad_norm_(parameters=model.parameters(), max_norm=10, norm_type=2)
optimizer.step()
...
clip_coef越小,则对梯度的裁剪越厉害,即,使梯度的值缩小的越多
max_norm越小,clip_coef越小,所以,max_norm越大,对于梯度爆炸的解决越柔和,max_norm越小,对梯度爆炸的解决越狠.max_norm可以取小数
ref
https://blog.csdn.net/Mikeyboi/article/details/119522689
边栏推荐
- Steady, 35K, byte business data analysis post
- Sliding window problem review
- Simple understanding of interpreters and compilers
- [leetcode daily question] number of enclaves
- [lgr-109] Luogu may race II & windy round 6
- Force buckle 1189 Maximum number of "balloons"
- ISP learning (2)
- Chip debugging of es8316 of imx8mp
- 图数据库ONgDB Release v-1.0.3
- On the solution of es8316's audio burst
猜你喜欢
Using stopwatch to count code time
Implementing fuzzy query with dataframe
Postman assertion
ISP learning (2)
The ECU of 21 Audi q5l 45tfsi brushes is upgraded to master special adjustment, and the horsepower is safely and stably increased to 305 horsepower
RT thread analysis log system RT_ Kprintf analysis
Class inheritance in yyds dry inventory C
[classic example] binary tree recursive structure classic topic collection @ binary tree
Simple understanding of interpreters and compilers
指針經典筆試題
随机推荐
GAMES202-WebGL中shader的編譯和連接(了解向)
yolov5 tensorrt加速
Huawei equipment is configured with OSPF and BFD linkage
Principle and performance analysis of lepton lossless compression
Zynq learning notes (3) - partial reconfiguration
pix2pix:使用条件对抗网络的图像到图像转换
Mysql高级篇学习总结9:创建索引、删除索引、降序索引、隐藏索引
[classic example] binary tree recursive structure classic topic collection @ binary tree
组播和广播的知识点梳理
[NOIP2008 提高组] 笨小猴
Review of double pointer problems
Why does MySQL need two-phase commit
In 2022, we must enter the big factory as soon as possible
nacos-高可用seata之TC搭建(02)
Postman管理测试用例
Summary of three log knowledge points of MySQL
MySQL advanced learning summary 9: create index, delete index, descending index, and hide index
趋势前沿 | 达摩院语音 AI 最新技术大全
Building intelligent gray-scale data system from 0 to 1: Taking vivo game center as an example
Nacos - TC Construction of High available seata (02)