当前位置：网站首页>[pytorch] pytorch automatic derivation, Tensor and Autograd

[pytorch] pytorch automatic derivation, Tensor and Autograd

2022-07-31 16:25:00 【Enzo wants to smash the computer】

在神经网络中,一个重要内容就是进行参数学习,而参数学习离不开求导.

现在大部分深度学习架构都有自动求导的功能,torch.autograd包就是用来自动求导的.
torch.autograd Packages are on tensors 所有的操作 提供了自动求导功能

Learn and record this one 自动求导的要点.

一、计算图

throughout the forward calculation process,PyTorch采用计算图的形式进行组织,The calculation diagram is动态图,and every time 前向传播时,将重新构建.其他深度学习架构,如TensorFlow、Keras一般为静态图.

在这里插入图片描述

计算图是一种有向无环图像,Graphically represent the relationship between operators and variables,直观高效.
图中圆形表示变量,矩阵表示算子
表达式：z=wx+b,可写成两个表示式： y=wx,z=y+b,
- 其中x、w、b为变量,是用户创建的变量,不依赖于其他变量,故又称为叶子节点.为计算各叶子节点的梯度,需要把对应的张量参数requires_grad属性设置为 True,这样就可自动跟踪其历史记录.（后面会细说）
- y、z 是计算得到的变量,非叶子节点,z为根节点
- mul和add是算子（或操作或函数）

These variables and operators constitute a complete calculation process （或前向传播过程）

二、自动求导要点

为实现对Tensor自动求导,需考虑如下事项：

1）创建叶子节点（Leaf Node）的Tensor,使用requires_gradThe parameter specifies whether to log it 的操作,以便之后利用backward()方法进行梯度求解.requires_grad参数的缺省值为 False,如果要对其求导需设置为True,Then the node that has a dependency on it automatically becomesTrue.

2）可利用requires_grad_()方法修改Tensor的requires_grad属性（For example, at the beginning of the training phase,requires_grad 值设置为了True,Modified in the testing phase to False）.可以调用.detach()或 with torch.no_grad()：,将不再计算张量的梯度,跟踪张量的历史记录.这点在Evaluation mode 型、Test model stage中常常用到.

3）通过created by the operationTensor（即非叶子节点）,会automatically assignedgrad_fn属性.the property sheet shows the gradient function.叶子节点的grad_fn为None.

4）最后得到的Tensor（根节点）执行backward()函数,此时自动计算各变量的梯度.

Each backpropagation ends,The gradient of leaf nodes will be cleared.If multiple backpropagation gradient accumulations are required,需要指定backward 中的参数retain_graph=True,In this way, the gradients of the child nodes are cumulative.
非叶子节点的梯度backward调用后即被清空

5）backward()函数接收参数,该参数应和调用backward()函数的Tensor的维度相同, 或者是可broadcast的维度.如果求导的Tensor为标量（即一个数字）,则backward中的参数可省略.

三、Calculation of scalar backpropagation

在这里插入图片描述

假设x、w、b都是标量,则计算结果 z 也是标量（z=wx+b）
对根节点z调用backward()方法,我们无须对 backward()传入参数
* 这里先提一嘴,It will be mentioned later：如果目标张量对一个非标量调用backward(),则需要传入一个 gradient参数,该参数也是张量,而且需要与调用backward()的张量形状相同.

以下是实现自动求导的主要步骤：

import torch

# 输入张量 x
x = torch.Tensor([2])

# 初始化 权重参数w, 偏移量b,并设置 require_grad 属性为 True, 为自动求导
w = torch.randn(1, requires_grad=True)
b = torch.randn(1, requires_grad=True)

# Implement forward propagation
y = torch.mul(w, x)
z = torch.add(y, b)

# View leaf nodes separately x, w, b 和 非叶子节点 y、z 的require_grad属性
print(x.requires_grad, w.requires_grad, b.requires_grad)  # False True True
print(y.requires_grad, z.requires_grad )  # True True

# 查看各节点是否为叶子节点
print(x.is_leaf, w.is_leaf, b.is_leaf, y.is_leaf, z.is_leaf)  # True True True False False

# 分别查看 叶子节点 和 非叶子节点 的 grad_fn 属性
print(x.grad_fn, w.grad_fn, b.grad_fn)   # None None None
print(y.grad_fn, z.grad_fn)   # <MulBackward0 object at 0x7f8ac1303910> <AddBackward0 object at 0x7f8ac1303070>

z.backward()  # Gradients do not accumulate
# z.backward(retain_graph=True) # 如果多次使用backward,Gradient accumulation is required,则需要修改参数retain_graph为True

# 查看叶子节点的梯度,x是叶子节点但它无须求导,故其梯度为None 
print(w.grad,b.grad,x.grad)  # tensor([2.]) tensor([1.]) None

#非叶子节点的梯度,执行backward之后,会自动清空 
print(y.grad,z.grad)  # None None