当前位置：网站首页>Pytorch learning note 4 - automatic calculation of gradient descent autograd

Pytorch learning note 4 - automatic calculation of gradient descent autograd

2022-07-28 06:28:00 【I have two candies】

List of articles

1. Tensors, Functions and Computational graph
2. Computing Gradients
3. Disabling Gradient Tracking
4. forward & backward

1. Tensors, Functions and Computational graph

torch.autograd It can automatically calculate the derivative of each element in the calculation diagram , For example, in the following calculation diagram w and b The derivative of is calculated by ：

It can be realized in this way ：

import torch

x = torch.ones(5)
y = torch.zeros(3)

w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)

z = torch.matmul(x, w) + b
# or (z = x.matmul(w) + b)

loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

Parameters requires_grad=True Specify automatic differential derivation when performing forward calculation , After calculating Z and loss after ,tensor z and loss An attribute will be automatically created grad_fn, It's a Function object , Used to calculate forward propagation and back propagation Derivation .

print(f'Gradient function for z = {
      z.grad_fn}')
print(f'Gradient function for loss = {
      loss.grad_fn}')

2. Computing Gradients

By calling loss.backward() Can do it once BP, It can be calculated automatically $\frac{\partial{loss}}{\partial{w}}$ and $\frac{\partial{loss}}{\partial{w}}$ , adopt w.grad and b.grad obtain
loss Yes w and b Differential of ：

loss.backward()
print(w.grad)
print(b.grad)

# tensor([[0.1814, 0.0460, 0.3266],
# [0.1814, 0.0460, 0.3266],
# [0.1814, 0.0460, 0.3266],
# [0.1814, 0.0460, 0.3266],
# [0.1814, 0.0460, 0.3266]])
# tensor([0.1814, 0.0460, 0.3266])

Be careful
1 . We can only obtain the grad properties for the leaf nodes of the computational graph, which have requires_grad property set to True. For all other nodes in our graph, gradients will not be available.

2 . We can only perform gradient calculations using backward once on a given graph, for performance reasons. If we need to do several backward calls on the same graph, we need to pass retain_graph=True to the backward call.

3. Disabling Gradient Tracking

All have requires_grad=True Of tensors Will record the calculation process and support gradient calculation , But sometimes we don't want to calculate gradient information , For example, when we point to let the model predict some samples , You can use torch.no_grad() To stop all gradient calculations ：

z = torch.matmul(x, w) + b
print(z.requires_grad)			# True

with torch.no_grad():
	z = torch.matmul(x, w) + b
print(z.requires_grad)			# False

Besides , You can also use z.detach() ：

z = torch.matmul(x, w) + b
print(z.detach().requires_grad)	# False

When judging the performance of the model , Need to be in with torch.no_grad() Under certain conditions ！

Other scenarios ：frozen Part of the network 、 Fine tuning the model 、 Accelerate forward propagation

4. forward & backward

Forward propagation

1 . Calculate the result of forward propagation , Save every operation Of grad_fn

Back propagation

1 . Calculate each .grad_fn Corresponding gradient value
2 . Add the gradient value to the corresponding tensor Of .grad Attribute
3 . Use the chain rule to calculate the gradient of leaf nodes

REFERENCE:
1 . https://pytorch.org/tutorials/beginner/basics/autogradqs_tutorial.html#disabling-gradient-tracking

For more information, please refer to ：PyTorch Learning notes

原网站

版权声明
本文为[I have two candies]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/209/202207280519049995.html