当前位置:网站首页>PyTorch 学习笔记 4 —— 自动计算梯度下降 AUTOGRAD
PyTorch 学习笔记 4 —— 自动计算梯度下降 AUTOGRAD
2022-07-28 05:24:00 【我有两颗糖】
文章目录
1. Tensors, Functions and Computational graph
torch.autograd 可以实现计算图中每一个单元的导数的自动计算,比如下面的计算图中w 和 b的导数计算按:

可以这样实现:
import torch
x = torch.ones(5)
y = torch.zeros(3)
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w) + b
# or (z = x.matmul(w) + b)
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)
参数 requires_grad=True 指定在进行前向计算时自动差分求导,计算完 Z 和 loss 后,tensor z 和 loss 会自动创建一个属性 grad_fn,它是一个 Function 对象,用于计算前向前向传播和 back propagation 求导。
print(f'Gradient function for z = {
z.grad_fn}')
print(f'Gradient function for loss = {
loss.grad_fn}')
2. Computing Gradients
通过调用 loss.backward() 可以做一次 BP,可以自动计算 ∂ l o s s ∂ w \frac{\partial{loss}}{\partial{w}} ∂w∂loss 和 ∂ l o s s ∂ w \frac{\partial{loss}}{\partial{w}} ∂w∂loss,通过 w.grad 和 b.grad 获取
loss 对 w 和 b 的微分:
loss.backward()
print(w.grad)
print(b.grad)
# tensor([[0.1814, 0.0460, 0.3266],
# [0.1814, 0.0460, 0.3266],
# [0.1814, 0.0460, 0.3266],
# [0.1814, 0.0460, 0.3266],
# [0.1814, 0.0460, 0.3266]])
# tensor([0.1814, 0.0460, 0.3266])
注意
1 . We can only obtain the grad properties for the leaf nodes of the computational graph, which have requires_grad property set to True. For all other nodes in our graph, gradients will not be available.
2 . We can only perform gradient calculations using backward once on a given graph, for performance reasons. If we need to do several backward calls on the same graph, we need to pass retain_graph=True to the backward call.
3. Disabling Gradient Tracking
所有具有 requires_grad=True 的 tensors 都会记录计算过程并支持梯度计算,但有时我们不希望计算梯度信息,比如当我们指向让模型预测一些样本时,此时可以使用 torch.no_grad() 来停止所有梯度计算:
z = torch.matmul(x, w) + b
print(z.requires_grad) # True
with torch.no_grad():
z = torch.matmul(x, w) + b
print(z.requires_grad) # False
此外,也可以使用 z.detach() :
z = torch.matmul(x, w) + b
print(z.detach().requires_grad) # False
判断模型性能时,需要在 with torch.no_grad() 条件下进行!
其他场景:frozen 网络中的一部分、模型微调、加速前向传播
4. forward & backward
前向传播
1 . 计算前向传播的结果,保存每一个 operation 的 grad_fn
后向传播
1 . 计算每一个 .grad_fn 对应的梯度值
2 . 将梯度值累加到对应 tensor 的 .grad 属性上
3 . 使用链式法则计算叶节点的梯度
REFERENCE:
1 . https://pytorch.org/tutorials/beginner/basics/autogradqs_tutorial.html#disabling-gradient-tracking
更多内容参考:PyTorch 学习笔记
边栏推荐
- ICC2(一)Preparing the Design
- Summary of command injection bypass methods
- 福禄克DTX-1800其配件DTX-CHA002通道适配器CHANNEL更换RJ45插座小记
- 针对大量数据,MATLAB生成EXCEL文件并进行排版处理的源码
- IMS-FACNN(Improved Multi-Scale Convolution Neural Network integrated with a Feature Attention Mecha
- set_ clock_ groups
- ESXi 社区版网卡驱动
- PLC的整体认识
- Low power design -power switch
- Varistor design parameters and classic circuit recording hardware learning notes 5
猜你喜欢

Internet of things interoperability system: classification, standards and future development

浪涌冲击抗扰度实验(SURGE)-EMC系列 硬件设计笔记6

(PHP graduation project) based on PHP online travel website management system to obtain

怎么看SIMULINK直接搭的模块的传递函数

EXFO 730c optical time domain reflectometer only has IOLm optical eye to upgrade OTDR (open OTDR permission)

开关电源电路EMI设计在layout过程中注意事项

监控安装ESXi on Arm的树莓派4b的CPU温度

dsp和fpga的通讯

Common CTF encryption methods JS

Random life-1
随机推荐
ESXi社区版网卡驱动再次更新
Deep learning (I): enter the theoretical part of machine learning and deep learning
Common CTF encryption methods JS
Surge impact immunity experiment (surge) -emc series Hardware Design Notes 6
说说ESXi虚拟交换机和端口组的“混杂模式”
Low power design -power switch
Communication between DSP and FPGA
TVs tube parameters and selection
2、 Openvino brief introduction and construction process
浅谈FLUKE光缆认证?何为CFP?何为OFP?
Web滚动字幕(MARQUEE示例)
Summary of Intranet Information Collection
Model inversion attacks that exploit confidence information on and basic countermeasures
AEM-TESTpro K50和南粤勘察结下的缘分
短跳线DSX-8000测试正常,但是DSX-5000测试无长度显示?
set_multicycle_path
ESXi on ARM v1.2 (2020年11月更新)
ConNeXt
福禄克DTX-1800其配件DTX-CHA002通道适配器CHANNEL更换RJ45插座小记
怎么看SIMULINK直接搭的模块的传递函数