当前位置:网站首页>Detailed analysis of pytorch's automatic derivation mechanism, pytorch's core magic
Detailed analysis of pytorch's automatic derivation mechanism, pytorch's core magic
2022-07-04 14:26:00 【Xiaobai learns vision】
Click on the above “ Xiaobai studies vision ”, Optional plus " Star standard " or “ Roof placement ”
Heavy dry goods , First time delivery
author :Vaibhav Kumar
compile :ronghuaiyang
Reading guide
This article analyzes in detail PyTorch Automatic derivation mechanism , Let you know PyTorch Core magic of .
In the process , It never explicitly constructs the entire Jacobian matrix . Direct calculation JVP It's usually simpler 、 More effective .We all agree that , When it comes to large neural networks , We are not good at calculus . It is unrealistic to calculate the gradient of such a large composite function by explicitly solving mathematical equations , In particular, these curves exist in a large number of dimensions , It's incomprehensible .
To deal with 14 Hyperplane in dimensional space , Imagine a three-dimensional space , Say to yourself loudly “14”. Everyone does this ——Geoffrey Hinton
This is it. PyTorch Of autograd Where it works . It abstracts complex mathematics , Help us “ Miraculously ” Calculate the gradient of high-dimensional curve , Just a few lines of code . This article attempts to describe autograd The magic of .
PyTorch Basics
Before further discussion , We need to know some basic PyTorch Concept .
tensor : In short , It's just PyTorch One of them n Dimension group . Tensors support some additional enhancements , This makes them unique : except CPU, They can be loaded or GPU Faster computing . Set up .requires_grad = True
When , They began to form a reverse diagram , Tracking is applied to each of their operations , Use the so-called dynamic calculation diagram (DCG) Calculate the gradient ( It will be explained further later ).
In earlier versions PyTorch in , Use torch.autograd.Variable
Class is used to create tensors that support gradient computation and operation tracking , But up to PyTorch v0.4.0,Variable Class is disabled .torch.Tensor
and torch.autograd.Variable
Now it is the same class . More precisely , torch.Tensor
Be able to track history and behave like the old Variable
.
import torch
import numpy as np
x = torch.randn(2, 2, requires_grad = True)
# From numpy
x = np.array([1., 2., 3.]) #Only Tensors of floating point dtype can require gradients
x = torch.from_numpy(x)
# Now enable gradient
x.requires_grad_(True)
# _ above makes the change in-place (its a common pytorch thing)
Code for creating various methods that enable the tensor of gradients Be careful : according to PyTorch The design of the , Gradients can only compute floating-point tensors , This is why I created a floating point type numpy Array , Then set it to gradient enabled PyTorch tensor .
Autograd: This class is an engine for calculating derivatives ( More precisely, Jacobian vector product ). It records a graph of all operations on the gradient tensor , And create an acyclic graph called dynamic calculation graph . The leaf nodes of this graph are input tensors , The root node is the output tensor . The gradient is achieved by tracing the graph from root to leaf , And use the chain rule to multiply each gradient .
Neural networks and back propagation
Neural networks are only carefully tuned ( Training ) A compound mathematical function that outputs the desired result . Adjustment or training is done through an excellent algorithm called back propagation . Back propagation is used to calculate the loss gradient relative to the input weight , So that the weight can be updated later , Ultimately reduce losses .
In a way , Back propagation is just a fancy name for the chain rule —— Jeremy Howard
Creating and training neural networks includes the following basic steps :
Define the architecture
Use input data to propagate forward architecturally
Calculate the loss
Back propagation , Calculate the gradient of each weight
Update weights with learning rates
The small change of the input weight caused by the loss change is called the gradient of the weight , And use back propagation to calculate . Then use the gradient to update the weight , Use the learning rate to reduce the loss as a whole and train the neural network .
This is done iteratively . For each iteration , You have to calculate several gradients , And build something called a computational graph for storing these gradient functions .PyTorch By building a dynamic calculation diagram (DCG) To achieve this . This diagram is built from scratch in each iteration , It provides maximum flexibility for gradient calculation . for example , For forward operation ( function )Mul
, Backward operation function MulBackward
It is dynamically integrated into the backward graph to calculate the gradient .
Dynamic calculation diagram
Tensors that support gradients ( Variable ) And the function ( operation ) Combine to create a dynamic calculation diagram . Data flows and operations applied to data are defined at run time , So as to dynamically construct the calculation diagram . This diagram is composed of the bottom autograd Class dynamically generated . You don't have to code all possible paths before starting training —— What you run is what you distinguish .
A simple DCG The multiplication for two tensors would be like this :
with requires_grad = False Of DCGEach point outline box in the figure is a variable , The purple rectangle is an operation .
Each variable object has several members , Some of the members are :
Data: It is the data held by a variable .x Hold one 1x1 tensor , Its value is equal to 1.0, and y hold 2.0.z Hold the product of two , namely 2.0.
requires_grad: This member ( If true) Start tracking all operation history , And form a backward graph for gradient calculation . For any tensor a, It can be treated in situ as follows :a.requires_grad_(True)
.
grad: grad Save the gradient value . If requires_grad
by False, It will hold a None value . Even if requires_grad
It's true , It will also hold a None value , Unless called from another node .backward()
function . for example , If you are right about out About x Calculate the gradient , call out.backward()
, be x.grad
The value of is ∂out/∂x.
grad_fn: This is the backward function used to calculate the gradient .
is_leaf: If :
It is explicitly initialized by some functions , such as
x = torch.tensor(1.0)
orx = torch.randn(1, 1)
( Basically, all tensor initialization methods discussed at the beginning of this paper ).It is created after the operation of tensor , All tensors have
requires_grad = False
.It is by calling
.detach()
Method to create .
Calling backward()
when , Only calculate requires_grad
and is_leaf
At the same time, the gradient of true nodes .
When open requires_grad = True
when ,PyTorch The tracking operation will begin , And store the gradient function in each step , As shown below :
stay PyTorch The code that generates the above figure is :
Backward() function
Backward Functions actually pass parameters ( By default 1x1 Unit tensor ) To calculate the gradient , It passes through Backward The graph goes all the way to each leaf node , Each leaf node can be traced from the root tensor of the call to the leaf node . Then the calculated gradient is stored in the .grad
in . please remember , The backward graph has been dynamically generated in the forward transfer process .backward The function calculates the gradient using only the generated graph , And store it in the leaf node .
Let's analyze the following code :
import torch
# Creating the graph
x = torch.tensor(1.0, requires_grad = True)
z = x ** 3
z.backward() #Computes the gradient
print(x.grad.data) #Prints '3' which is dz/dx
An important thing to pay attention to is , When calling z.backward()
when , A tensor is automatically passed as z.backward(torch.tensor(1.0))
.torch.tensor(1.0)
Is the external gradient used to terminate the chain rule gradient multiplication . This external gradient is passed as input to MulBackward
function , To further calculate x Gradient of . Pass on to .backward()
The dimension of the tensor in must be the same as that of the tensor in which the gradient is being calculated . for example , If the gradient supports tensor x and y as follows :
x = torch.tensor([0.0, 2.0, 8.0], requires_grad = True)
y = torch.tensor([5.0 , 1.0 , 7.0], requires_grad = True)
z = x * y
then , To calculate z
About x
perhaps y
Gradient of , You need to pass an external gradient to z.backward()
function , As shown below :
z.backward(torch.FloatTensor([1.0, 1.0, 1.0])
z.backward()
Will give RuntimeError: grad can be implicitly created only for scalar outputs
The tensor passed by the inverse function is like the weight of the gradient weighted output . Mathematically speaking , This is a vector multiplied by a non scalar tensor Jacobian matrix ( This paper will further discuss ), Therefore, it is almost always a unit tensor of one dimension , And backward
Same tensor , Unless you need to calculate the weighted output .
tldr : The backward graph is composed of autograd Class is automatically and dynamically created during forward passing .
Backward()
The gradient is calculated simply by passing its parameters to the generated inverse graph .
mathematics — Jacobian matrix and vector
Mathematically speaking ,autograd Class is just a Jacobian vector product calculation engine . Jacobian matrix is a very simple word , It represents all possible partial derivatives of two vectors . It's the gradient of one vector relative to another .
Be careful : In the process ,PyTorch The entire Jacobian matrix is never explicitly constructed . Direct calculation JVP (Jacobian vector product) It's usually simpler 、 More effective .
If a vector X = [x1, x2,…xn] adopt f(X) = [f1, f2,…fn] To calculate other vectors , Then Jacobian matrix (J) Contains all of the following partial derivative combinations :
Jacobian matrixThe matrix above represents f(X) be relative to X Gradient of .
Suppose one is enabled PyTorch Tensor of gradient X:
X = [x1,x2,…,xn]( Suppose this is the weight of a machine learning model )
X After some operations, form a vector Y
Y = f(X) = [y1, y2,…,ym]
And then use Y Calculate scalar loss l. Suppose that the vector v It happens to be a scalar loss l About vectors Y Gradient of , as follows :
vector v be called grad_tensor
, And pass it as a parameter to backward()
function .
In order to get the gradient of loss l About weight X Gradient of , Jacobian matrix J Is vector times vector v
This method calculates the Jacobian matrix and compares it with the vector v The method of multiplication makes PyTorch It can easily provide external gradients for non scalar output .
The good news !
Xiaobai learns visual knowledge about the planet
Open to the outside world
download 1:OpenCV-Contrib Chinese version of extension module
stay 「 Xiaobai studies vision 」 Official account back office reply : Extension module Chinese course , You can download the first copy of the whole network OpenCV Extension module tutorial Chinese version , Cover expansion module installation 、SFM Algorithm 、 Stereo vision 、 Target tracking 、 Biological vision 、 Super resolution processing and other more than 20 chapters .
download 2:Python Visual combat project 52 speak
stay 「 Xiaobai studies vision 」 Official account back office reply :Python Visual combat project , You can download, including image segmentation 、 Mask detection 、 Lane line detection 、 Vehicle count 、 Add Eyeliner 、 License plate recognition 、 Character recognition 、 Emotional tests 、 Text content extraction 、 Face recognition, etc 31 A visual combat project , Help fast school computer vision .
download 3:OpenCV Actual project 20 speak
stay 「 Xiaobai studies vision 」 Official account back office reply :OpenCV Actual project 20 speak , You can download the 20 Based on OpenCV Realization 20 A real project , Realization OpenCV Learn advanced .
Communication group
Welcome to join the official account reader group to communicate with your colleagues , There are SLAM、 3 d visual 、 sensor 、 Autopilot 、 Computational photography 、 testing 、 Division 、 distinguish 、 Medical imaging 、GAN、 Wechat groups such as algorithm competition ( It will be subdivided gradually in the future ), Please scan the following micro signal clustering , remarks :” nickname + School / company + Research direction “, for example :” Zhang San + Shanghai Jiaotong University + Vision SLAM“. Please note... According to the format , Otherwise, it will not pass . After successful addition, they will be invited to relevant wechat groups according to the research direction . Please do not send ads in the group , Or you'll be invited out , Thanks for your understanding ~
边栏推荐
- 第十六章 字符串本地化和消息字典(二)
- R language uses dplyr package group_ The by function and the summarize function calculate the mean and standard deviation of the target variables based on the grouped variables
- 92.(cesium篇)cesium楼栋分层
- LiveData
- 一文概览2D人体姿态估计
- [FAQ] summary of common causes and solutions of Huawei account service error 907135701
- 软件测试之测试评估
- Detailed index of MySQL
- C# wpf 实现截屏框实时截屏功能
- STM32F1与STM32CubeIDE编程实例-MAX7219驱动8位7段数码管(基于GPIO)
猜你喜欢
Chapter 17 process memory
电商系统中红包活动设计
Nowcoder reverse linked list
Detailed index of MySQL
DDD application and practice of domestic hotel transactions -- Code
Digi重启XBee-Pro S2C生产,有些差别需要注意
nowcoder重排链表
【信息检索】分类和聚类的实验
[matlab] summary of conv, filter, conv2, Filter2 and imfilter convolution functions
【MySQL从入门到精通】【高级篇】(四)MySQL权限管理与控制
随机推荐
MySQL之详解索引
Test process arrangement (2)
去除重复字母[贪心+单调栈(用数组+len来维持单调序列)]
(1)性能调优的标准和做好调优的正确姿势-有性能问题,上HeapDump性能社区!
架构方面的进步
【FAQ】華為帳號服務報錯 907135701的常見原因總結和解决方法
R语言使用epiDisplay包的followup.plot函数可视化多个ID(病例)监测指标的纵向随访图、使用stress.col参数指定强调线的id子集的颜色(色彩)
The font of markdown grammar is marked in red
Detailed index of MySQL
关于miui12.5 红米k20pro用au或者povo2出现问题的解决办法
数据仓库面试问题准备
The game goes to sea and operates globally
GCC [6] - 4 stages of compilation
数据埋点的一些问题和想法
R语言ggplot2可视化:gganimate包创建动态折线图动画(gif)、使用transition_reveal函数在动画中沿给定维度逐步显示数据
[MySQL from introduction to proficiency] [advanced chapter] (V) SQL statement execution process of MySQL
为什么图片传输要使用base64编码
sql优化之查询优化器
No servers available for service: xxxx
nowcoder重排链表