当前位置:网站首页>【torch】|torch. nn. utils. clip_ grad_ norm_
【torch】|torch. nn. utils. clip_ grad_ norm_
2022-07-06 05:18:00 【rrr2】

The greater the gradient ,total_norm The bigger the value is. , Leading to clip_coef The smaller the value of , Eventually, it will also lead to the more severe clipping of the gradient , Very reasonable.
norm_type Take... No matter how much , about total_norm The impact is not too great (1 and 2 The gap is a little larger ), So you can take the default value directly 2
norm_type The bigger it is ,total_norm The smaller it is ( The conclusions observed in the experiment , Math is not good , It will not prove that , So this article is not necessarily right )
...
loss = crit(...)
optimizer.zero_grad()
loss.backward()
torch.nn.utils.clip_grad_norm_(parameters=model.parameters(), max_norm=10, norm_type=2)
optimizer.step()
...
clip_coef The smaller it is , The more severe the cutting of gradient , namely , The more you reduce the value of the gradient
max_norm The smaller it is ,clip_coef The smaller it is , therefore ,max_norm The bigger it is , The softer the solution of gradient explosion ,max_norm The smaller it is , The harder to solve the gradient explosion .max_norm You can take decimals
ref
https://blog.csdn.net/Mikeyboi/article/details/119522689
边栏推荐
- HAC集群修改管理员用户密码
- Why does MySQL need two-phase commit
- Sliding window problem review
- Figure database ongdb release v-1.0.3
- Configuration file converted from Excel to Lua
- TCP three handshakes you need to know
- Tetris
- Check the useful photo lossless magnification software on Apple computer
- Pointer classic written test questions
- [leetcode] 18. Sum of four numbers
猜你喜欢

nacos-高可用seata之TC搭建(02)

Hyperledger Fabric2. Some basic concepts of X (1)

SQLite add index

Three methods of Oracle two table Association update

Microblogging hot search stock selection strategy

Nacos TC setup of highly available Seata (02)

Cve-2019-11043 (PHP Remote Code Execution Vulnerability)

Ad20 is set with through-hole direct connection copper sheet, and the bonding pad is cross connected

Vulhub vulnerability recurrence 67_ Supervisor

Imperial cms7.5 imitation "D9 download station" software application download website source code
随机推荐
Collection + interview questions
Please wait while Jenkins is getting ready to work
MySQL advanced learning summary 9: create index, delete index, descending index, and hide index
Nestjs配置文件上传, 配置中间件以及管道的使用
指针经典笔试题
[classic example] binary tree recursive structure classic topic collection @ binary tree
Simple understanding of interpreters and compilers
【torch】|torch.nn.utils.clip_grad_norm_
Promotion hung up! The leader said it wasn't my poor skills
Golang -- TCP implements concurrency (server and client)
MySQL time processing
Tetris
Talking about the type and function of lens filter
Compilation and connection of shader in games202 webgl (learn from)
Microblogging hot search stock selection strategy
Ad20 is set with through-hole direct connection copper sheet, and the bonding pad is cross connected
Unity gets the width and height of Sprite
2021RoboCom机器人开发者大赛(初赛)
[mask requirements of OSPF and Isis in multi access network]
CUDA11.1在线安装