当前位置:网站首页>【torch】|torch. nn. utils. clip_ grad_ norm_
【torch】|torch. nn. utils. clip_ grad_ norm_
2022-07-06 05:18:00 【rrr2】

The greater the gradient ,total_norm The bigger the value is. , Leading to clip_coef The smaller the value of , Eventually, it will also lead to the more severe clipping of the gradient , Very reasonable.
norm_type Take... No matter how much , about total_norm The impact is not too great (1 and 2 The gap is a little larger ), So you can take the default value directly 2
norm_type The bigger it is ,total_norm The smaller it is ( The conclusions observed in the experiment , Math is not good , It will not prove that , So this article is not necessarily right )
...
loss = crit(...)
optimizer.zero_grad()
loss.backward()
torch.nn.utils.clip_grad_norm_(parameters=model.parameters(), max_norm=10, norm_type=2)
optimizer.step()
...
clip_coef The smaller it is , The more severe the cutting of gradient , namely , The more you reduce the value of the gradient
max_norm The smaller it is ,clip_coef The smaller it is , therefore ,max_norm The bigger it is , The softer the solution of gradient explosion ,max_norm The smaller it is , The harder to solve the gradient explosion .max_norm You can take decimals
ref
https://blog.csdn.net/Mikeyboi/article/details/119522689
边栏推荐
- Pickle and savez_ Compressed compressed volume comparison
- Easy to understand I2C protocol
- Vulhub vulnerability recurrence 67_ Supervisor
- Drive development - the first helloddk
- Extension of graph theory
- MySQL if and ifnull use
- Driver development - hellowdm driver
- UCF(2022暑期团队赛一)
- UCF(暑期团队赛二)
- Promotion hung up! The leader said it wasn't my poor skills
猜你喜欢

Can the feelings of Xi'an version of "Coca Cola" and Bingfeng beverage rush for IPO continue?

Configuration file converted from Excel to Lua

Summary of redis basic knowledge points

Yyds dry inventory SSH Remote Connection introduction

【LeetCode】18、四数之和
![[mask requirements of OSPF and Isis in multi access network]](/img/7d/1ba80bb906caa9be4bef165ac26d2c.png)
[mask requirements of OSPF and Isis in multi access network]

Pointer classic written test questions

Review of double pointer problems

【LGR-109】洛谷 5 月月赛 II & Windy Round 6

Hyperledger Fabric2. Some basic concepts of X (1)
随机推荐
Codeforces Round #804 (Div. 2) Editorial(A-B)
Some common skills on unity inspector are generally used for editor extension or others
Oracle deletes duplicate data, leaving only one
Golang -- TCP implements concurrency (server and client)
Huawei od computer test question 2
剑指 Offer II 039. 直方图最大矩形面积
MySQL if and ifnull use
Summary of three log knowledge points of MySQL
Using stopwatch to count code time
Class inheritance in yyds dry inventory C
Codeforces Round #804 (Div. 2)
Principle and performance analysis of lepton lossless compression
[leetcode] 18. Sum of four numbers
Zynq learning notes (3) - partial reconfiguration
Questions d'examen écrit classiques du pointeur
组播和广播的知识点梳理
驱动开发——第一个HelloDDK
CUDA11.1在线安装
[buuctf.reverse] 159_ [watevrCTF 2019]Watshell
Force buckle 1189 Maximum number of "balloons"