当前位置：网站首页>Copy in pytorch_ Summary of differences between (), detach (), data (), and clone () operations

Copy in pytorch_ Summary of differences between (), detach (), data (), and clone () operations

2022-06-22 01:40:00 【Not late not late】

List of articles

1. clone
2. copy_
3. detach
4. data

1. clone

b = a.clone()

Create a tensor With the source tensor Have the same shape,dtype and device, Do not share memory addresses , But new tensor(b) The gradient of will be superimposed on the source tensor(a) On . It should be noted that ,b = a.clone() after ,b Not a leaf node , So you can't access its gradient .

import torch

a = torch.tensor([1.,2.,3.],requires_grad=True)
b = a.clone()


print('=========================== Do not share address =========================')
print(type(a), a.data_ptr())
print(type(b), b.data_ptr())
print('===========================clone Then output... Respectively =========================')
print('a: ', a) # a: tensor([1., 2., 3.], requires_grad=True)
print('b: ', b) #b: tensor([1., 2., 3.], grad_fn=<CloneBackward0>)

c = a ** 2
d = b ** 3
print('=========================== Back propagation =========================')
c.sum().backward() # 2* a
print('a.grad: ', a.grad)  #a.grad: tensor([2., 4., 6.])

d.sum().backward() # 3b**2
print('a.grad: ', a.grad) #a.grad: tensor([ 5., 16., 33.]) , Will b The gradients add up 
#print('b.grad: ', b.grad) # b.grad: None ,  It is no longer a leaf of the calculation graph , No access b.grad

Output ：

=========================== Do not share address =========================
<class 'torch.Tensor'> 93899916787840
<class 'torch.Tensor'> 93899917014528
===========================clone Then output... Respectively =========================
a:  tensor([1., 2., 3.], requires_grad=True)
b:  tensor([1., 2., 3.], grad_fn=<CloneBackward0>)
=========================== Back propagation =========================
a.grad:  tensor([2., 4., 6.])
a.grad:  tensor([ 5., 16., 33.])

2. copy_

b = torch.empty_like(a).copy_(a)

copy_() The function needs a target tensor, In other words, you need to build b, And then a Copy to b, and clone Operation does not require .

copy_() Function completion and clone() function Similar functions , But there are differences . call copy_() The object of is the target tensor, The parameter is the copy operation from Of tensor, Finally, it will return to the target tensor; and clone() The calling object of is the source tensor, Return to a new tensor. Of course clone() The function can also use torch.clone() call , Will source tensor As a parameter .

import torch

a = torch.tensor([1., 2., 3.],requires_grad=True)
b = torch.empty_like(a).copy_(a)

print('====================copy_ Memory is different ======================')
print(a.data_ptr())
print(b.data_ptr()) 
print('====================copy_ Print ======================')
print(a) 
print(b) 

c = a ** 2
d = b ** 3
print('===================c Back propagation =======================')
c.sum().backward()
print(a.grad) # tensor([2., 2., 2.])
print('===================d Back propagation =======================')
d.sum().backward()
print(a.grad) #  Source tensor The gradients add up 
#print(b.grad) # None

Output ：

====================copy_ Memory is different ======================
94358408685568
94358463065088
====================copy_ Print ======================
tensor([1., 2., 3.], requires_grad=True)
tensor([1., 2., 3.], grad_fn=<CopyBackwards>)
===================c Back propagation =======================
tensor([2., 4., 6.])
===================d Back propagation =======================
tensor([ 5., 16., 33.])

3. detach

detach() Function return and call object tensor A related one tensor, This new tensor With the source tensor Shared data memory （ that tensor The data must be the same ）, but Its requires_grad by False, And does not contain the source tensor Calculation diagram information of .

import torch

a = torch.tensor([1., 2., 3.],requires_grad=True)
b = a.detach()
print('========================= Shared memory ==============================')
print(a.data_ptr()) 
print(b.data_ptr()) 
print('========================= Original value and detach==============================')
print(a) 
print(b) 

c = a * 2
d = b * 3 # No back propagation 
print('========================= Original value back propagation ==============================')
c.sum().backward()
print(a.grad) 
print('=========================detach No back propagation ==============================')
# d.sum().backward()

Output ：

========================= Shared memory ==============================
94503766034432
94503766034432
========================= Original value and detach==============================
tensor([1., 2., 3.], requires_grad=True)
tensor([1., 2., 3.])
========================= Original value back propagation ==============================
tensor([2., 2., 2.])
=========================detach No back propagation ==============================

because b It has been separated from the calculation diagram ,pytorch Naturally, the subsequent calculation process is not tracked . If you want to b Add the calculation chart again , It only needs b.requires_grad_().

pytorch You can continue to track b The calculation of , But the gradient does not change from b Retrogression a, The gradient is truncated . But because of b And a Shared memory ,a And b Will always be equal .

4. data

data The way is to get a tensor Data information , The information returned is the same as that mentioned above detach() The information returned is the same , Also has the Same memory , Do not save gradient Characteristics of information . however data Sometimes it's not safe , Because they share memory , If you change one, the other will change , While using detach When using back propagation, an error will be reported .

import torch
import pdb
x = torch.FloatTensor([[1., 2.]]) # Default x.requires_grad == False, Only float Types can be backpropagated 
w1 = torch.FloatTensor([[2.], [1.]])
w2 = torch.FloatTensor([3.])

w1.requires_grad = True
w2.requires_grad = True

d = torch.matmul(x, w1) # After multiplying ,d Of requires_grad = True（ The addition operation is also True）

d_ = d.data # d and d_ Will share memory ,d_ Of requires_grad = False
# d_ = d.detach() #d and d_ Memory is also shared , But it can't be spread back 

f = torch.matmul(d, w2)
d_[:] = 1 #d_ Changed the value , therefore d The value of 

f.backward() # Use data Will get the wrong value , Use detach False report