当前位置:网站首页>Gradient accumulation in pytorch [during the experiment, due to the limitation of GPU video memory, the batch\u size can no longer be increased. To solve this problem, the gradient accumulation method
Gradient accumulation in pytorch [during the experiment, due to the limitation of GPU video memory, the batch\u size can no longer be increased. To solve this problem, the gradient accumulation method
2022-06-12 23:00:00 【u013250861】
During the experiment , because GPU Memory limit , encounter batch_size A situation that cannot be increased . To solve the problem , Using the gradient accumulation method .
The method without gradient accumulation is as follows :
for i,(images,target) in enumerate(train_loader):
# 1. input output
images = images.cuda(non_blocking=True)
target = torch.from_numpy(np.array(target)).float().cuda(non_blocking=True)
outputs = model(images)
loss = criterion(outputs,target)
# 2. backward
optimizer.zero_grad() # reset gradient
loss.backward()
optimizer.step()
Use gradient accumulation :
for i,(images,target) in enumerate(train_loader):
# 1. input output
images = images.cuda(non_blocking=True)
target = torch.from_numpy(np.array(target)).float().cuda(non_blocking=True)
outputs = model(images)
loss = criterion(outputs,target)
# 2.1 loss regularization
loss = loss/accumulation_steps
# 2.2 back propagation
loss.backward()
# 3. update parameters of net
if((i+1)%accumulation_steps)==0:
# optimizer the net
optimizer.step() # update parameters of net
optimizer.zero_grad() # reset gradient
Original batch size by 32, Use gradient accumulation , Set up accumulation_steps=4, At this point, just put batch_size Set to 8, Can achieve the previous effect .
Reference material :
Pytorch Gradient accumulation in
边栏推荐
- JVM foundation - > What garbage collectors does the JVM have?
- be careful! Your Navicat may have been poisoned
- 【Web技术】1348- 聊聊水印实现的几种方式
- Use js to listen for Keydown event
- Theory + practice will help you master the dynamic programming method
- 四元数简介
- 同花顺股票账户开户安全吗
- 80 lines of code to realize simple rxjs
- LeetCode 890 查找和替换模式[map] HERODING的LeetCode之路
- 【LeetCode】数组中第K大的元素
猜你喜欢
Hostvars in ansible
[recommended collection] easy to understand graphic network knowledge - Part 1
Hostvars in ansible
MYSQL 行转列、列转行、多列转一行、一行转多列
[Part VI] source code analysis and application details of countdownlatch [key]
C language: how to give an alias to a global variable?
The programmer dedicated to promoting VIM has left. Father of vim: I will dedicate version 9.0 to him
JVM foundation > CMS garbage collector
2022 heavyweight: growth law - skillfully use digital marketing to break through enterprise difficulties
Qrcodejs2 QR code generation JS
随机推荐
Zabbix的功能介绍和常用术语
[Part 8] semaphore source code analysis and application details [key points]
[web technology] 1348- talk about several ways to implement watermarking
Analysis report on business model innovation path and operation status of China's app store industry from 2022 to 2028
iShot
JVM foundation > G1 garbage collector
Research Report on water sports shoes industry - market status analysis and development prospect forecast
【LeetCode】数组中第K大的元素
80 lines of code to realize simple rxjs
【890. 查找和替换模式】
csredis-in-asp. Net core theory practice - use examples
Is there any risk in opening a securities account? How to open an account safely?
[leetcode] sword finger offer II 020 Number of palindrome substrings
Is it safe to open an account in tonghuashun? How to open an account for securities
JVM Basics - > What are the JVM parameters?
[recommended collection] easy to understand graphic network knowledge - Part 1
80 lines of code to realize simple rxjs
Mysql case when then函数使用
JVM foundation - what is the process of loading > objects into the JVM, and then clearing them by GC?
模型过拟合-解决方案(二):Dropout