当前位置：网站首页>Four kinds of hooks in deep learning

Four kinds of hooks in deep learning

2022-06-13 08:51:00 【Human high quality Algorithm Engineer】

In order to save video memory （ Memory ）,pytorch Do not save intermediate variables during calculation , Including the characteristic graph of the middle layer and the gradient of the non leaf tensor . Sometimes it is necessary to view or modify these intermediate variables when analyzing the network , You need to register a hook （hook） To export the required intermediate variables . There are many online introductions to this , But I looked around , There are many inaccuracies or incomprehensible places , Let me sum up here , Give the actual usage and notes .
hook There are four ways :
torch.Tensor.register_hook()
torch.nn.Module.register_forward_hook()
torch.nn.Module.register_backward_hook()
torch.nn.Module.register_forward_pre_hook().

The first one is torch.Tensor.register_hook()

import torch
def grad_hook(grad):
    grad *= 1.6
x = torch.tensor([1., 1., 1., 1.], requires_grad=True)
y = torch.pow(x, 2)
z = torch.sum(y)
h = x.register_hook(grad_hook)
z.backward()
print(x.grad)
h.remove()    # removes the hook

The result is ：

tensor([3.2000, 3.2000, 3.2000, 3.2000])

import torch
def grad_hook(grad):
    grad *= 50
x = torch.tensor([2., 2., 2., 2.], requires_grad=True)
y = torch.pow(x, 2)
z = torch.mean(y)
h = x.register_hook(grad_hook)
z.backward()
print(x.grad)
h.remove()    # removes the hook
>>> tensor([50., 50., 50., 50.])

How is this value calculated , In fact, it is in the time of back propagation , obtain x Gradient of , The above code not only obtains x Gradient of , And multiply it by 1.6, Can change the value of the gradient .
notes ：

It can be used remove() Method cancel hook. Be careful remove() Must be in backward() after , Because only in execution backward() When the sentence is ,pytorch Just started to calculate the gradient , And in the x.register_hook(grad_hook) When it's just " register " One. grad The hook , There is no calculation at this time , And perform remove Just cancel the hook , And then again backward() The hook doesn't work .

The second kind torch.nn.Module.register_forward_hook(module, in, out)
Used to export the specified sub module （ Can be a layer 、 Module etc. nn.Module type ） The input-output tensor of , But only the output can be modified , It is often used to derive or modify convolution characteristic graph .

inps, outs = [],[]
def layer_hook(module, inp, out):
    inps.append(inp[0].data.cpu().numpy())
    outs.append(out.data.cpu().numpy())

hook = net.layer1.register_forward_hook(layer_hook)
output = net(input)
hook.remove()

Be careful ：（1） Because modules can be multi input , So the input is tuple Type , You need to extract the Tensor Then operate ; The output is Tensor Type can be used directly .
（2） Do not put it on the video memory after exporting , Unless you have A100.
（3） Only the output can be modified out Value , Cannot modify input inp Value （ Can't return , Local modifications are also invalid ）, It is better to use return returns , Such as ：

def layer_hook(self, module, inp, out):
    out = self.lam * out + (1 - self.lam) * out[self.indices]
    return out

This code is used in manifold mixup in , It is used to mix the features of the middle layer to achieve data enhancement , among self.lam It's a [0,1] Probability value ,self.indices yes shuffle The serial number after .

3, torch.nn.Module.register_forward_pre_hook(module, in)
Used to export or modify the input tensor of the specified sub module .

def pre_hook(module, inp):
    inp0 = inp[0]
    inp0 = inp0 * 2
    inp = tuple([inp0])
    return inp

hook = net.layer1.register_forward_pre_hook(pre_hook)
output = net(input)
hook.remove()

Be careful ：（1）inp Value is a tuple type , So we need to extract the tensor first , Do something else , And then it has to be transformed into tuple return .
（2） In execution output = net(input) This sentence is called only when ,remove() It can be used to cancel the hook after the call .

4, torch.nn.Module.register_backward_hook(module, grad_in, grad_out)
Used to derive the gradient of the input-output tensor of the specified sub module , But only the gradient of the input tensor can be modified （ That is, it can only return gin）, The output tensor gradient is not modifiable .

gouts = []
def backward_hook(module, gin, gout):
    print(len(gin),len(gout))
    gouts.append(gout[0].data.cpu().numpy())
    gin0,gin1,gin2 = gin
    gin1 = gin1*2
    gin2 = gin2*3
    gin = tuple([gin0,gin1,gin2])
    return gin

hook = net.layer1.register_backward_hook(backward_hook)
loss.backward()
hook.remove()

Be careful ：
（1） Among them grad_in and grad_out All are tuple, Must be untied first , When modifying, perform the operation and then put it back tuple return .
（2） This hook function is in backward() Statement is called , therefore remove() Put it on backward() Then it is used to cancel the hook .

原网站

版权声明
本文为[Human high quality Algorithm Engineer]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202270536289425.html

当前位置：网站首页>Four kinds of hooks in deep learning

Four kinds of hooks in deep learning

边栏推荐

猜你喜欢

随机推荐