当前位置:网站首页>Loops in tensorrt
Loops in tensorrt
2022-07-02 06:29:00 【Little Heshang sweeping the floor】
TensorRT The cycle in
Click here to join NVIDIA Developer Program
NVIDIA TensorRT Support circular structure , This is useful for circular Networks . TensorRT Loops support scanning input tensors 、 The cyclic definition of tensor and “ Scan output ” and “ Last value ” Output .
1. Defining A Loop
The cycle consists of a cyclic boundary layer (loop boundary layers) Definition .
ITripLimitLayerSpecify the number of iterations of the loop .IIteratorLayerEnable the loop to iterate the tensor .IRecurrenceLayerSpecify a cycle definition .ILoopOutputLayerSpecify the output of the loop .
Each boundary layer inherits from the class ILoopBoundaryLayer , This class has a method getLoop() Used to get its associated ILoop . ILoop Object identification loop . Have the same ILoop All cyclic boundary layers of belong to this cycle .
The following figure depicts the structure of the loop and the data flow at the boundary . Cyclic invariant tensors can be used directly inside loops , for example FooLayer Shown .
[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-Ifo18gkx-1656593793084)(loop.png)]
A cycle can have multiple IIteratorLayer 、 IRecurrenceLayer and ILoopOutputLayer , And there can be at most two ITripLimitLayer , As described later . No, ILoopOutputLayer The loop of has no output , And by the TensorRT Optimize .
NVIDIA TensorRT The flow control structure layer in the support matrix describes Can be used inside the loop TensorRT layer .
The inner layer is free to use tensors defined inside or outside the loop . Other loops can be included inside ( see also Nested loop ) And other conditional constructions ( see also Conditions of nested ).
To define a loop , First , Use INetworkDefinition ::addLoop Method to create a ILoop object . Then add boundary layer and inner layer . The rest of this section describes the characteristics of the boundary layer , Use loop Express INetworkDefinition ::addLoop Back to ILoop* .
ITripLimitLayer Support counting cycles and while loop .
loop ->addTripLimit( t ,TripLimit::kCOUNT)Create aITripLimitLayer, Its inputtSpecifies the number of iterations of the loop 0D INT32 tensor .loop ->addTripLimit( t ,TripLimit::kWHILE)Create aITripLimitLayer, Its inputtIt's a 0D Bool tensor , Used to specify whether iterations should be performed . UsuallytOrIRecurrenceLayerOutput , Or the calculation based on the output .
A cycle can have at most one limit .
IIteratorLayer Supports forward or backward iterations on any axis .
loop ->addIterator( t )Add oneIIteratorLayer, It is tensor t The shaft 0 Iterate on the . for example , If the input is a matrix :
2 3 5
4 6 8
One dimensional tensor of the first iteration {2, 3, 5} And the second iteration {4, 6, 8} . Iterations beyond the tensor range are invalid .
loop ->addIterator( t , axis )similar , But this layer iterates on a given axis . for example , If axis=1 And the input is matrix , Then each iteration will transfer a column of the matrix .loop ->addIterator( t , axis,reverse )similar , But ifreverse =true, Then the layer produces its output in reverse order .
ILoopOutputLayer Support three forms of cyclic output :
loop ->addLoopOutput( t, LoopOutput::kLAST_VALUE)OutputtThe last value of , amongtMust beIRecurrenceLayerOutput .loop-> addLoopOutput( t ,LoopOutput::kCONCATENATE, axis )Output the input of each iteration in series tot. for example , If the input is a one-dimensional tensor , The value of the first iteration is{ a,b,c}, The value of the second iteration is{d,e,f}, axis =0 , Then the output is matrix :
a b c
d e f
If axis =1 , The output of :
a d
b e
c f
loop-> addLoopOutput( t ,LoopOutput::kREVERSE, axis )similar , But the order is reversed .kCONCATENATEandkREVERSEForm requires a second input , This is a 0D INT32 Shape tensor , Used to specify the length of the new output dimension . When the length is greater than the number of iterations , Additional elements contain arbitrary values . The second input , for exampleu, You should useILoopOutputLayer::setInput(1, u )Set up .
Last , also IRecurrenceLayer . Its first input specifies the initial output value , The second input specifies the next output value . The first input must come from outside the loop ; The second input usually comes from inside the loop . for example , This C++ Fragment TensorRT simulation :
for (int32_t i = j; ...; i += k) ...
You can create , among j and k yes ITensor* .
ILoop* loop = n.addLoop();
IRecurrenceLayer* iRec = loop->addRecurrence(j);
ITensor* i = iRec->getOutput(0);
ITensor* iNext = addElementWise(*i, *k,
ElementWiseOperation::kADD)->getOutput(0);
iRec->setInput(1, *iNext);
The second input is TensorRT The only case where trailing edge is allowed . If these inputs are deleted , Then the remaining network must be acyclic .
2. Formal Semantics
TensorRT With application semantics , This means that there are no visible side effects other than engine input and output . Because there are no side effects , Intuition about loops in imperative languages is not always valid . This section defines TensorRT The formal semantics of circular structure .
Formal semantics is based on tensor inert sequences (lazy sequences). Each iteration of the loop corresponds to an element in the sequence . Cyclic tensor X The sequence of is expressed as * X 0, X 1, X 2, ... * . The elements of the sequence are lazily evaluated , It means as needed .
IIteratorLayer(X) The output of is * X[0], X[1], X[2], ... * among X[i] It means that IIteratorLayer Subscript on the specified axis .
IRecurrenceLayer(X,Y) The output of is * X, Y0, Y1, Y2, ... * .
The input and output of depends on LoopOutput The type of .
kLAST_VALUE: The input is a single tensor X , about n-trip loop , The output is X n .kCONCATENATE: The first input is tensorX, The second input is the scalar shape tensorY. The result isX0, X1, X2, ... Xn-1With post fill ( If necessary, ) Connect toYSpecified length . IfY < nRuntime error .YIs the construction time constant . Pay attention to andIIteratorLayerThe inverse relationship of .IIteratorLayerMap the tensor to a series of sub tensors ; withkCONCATENATEOfILoopOutputLayerMap a series of sub tensors to a tensor .- kREVERSE : Be similar to
kCONCATENATE, But the output direction is opposite .
ILoopOutputLayer In the output definition of n The value is determined by the circular ITripLimitLayer determine :
- For counting cycles , It is the iteration count , Express ITripLimitLayer The input of .
- about while loop , It's the smallest n bring X n X_n Xn For false , among
XyesITripLimitLayerThe sequence of input tensors .
The output of acyclic layer is the sequential application of layer functions . for example , For a two input acyclic layer F(X,Y) = * f(X 0 , Y 0 ), f(X 1 , Y 1 ), f(X 2 , Y 2 )... * . If a tensor comes from outside the loop , That is, loop invariant , Then its sequence is created by copying tensor .
3. Nested Loops
TensorRT Infer the nesting of loops from the data flow . for example , If the cycle B Used in cycles A The value defined in , be B Considered nested in A in .
TensorRT Reject loops without clean nested Networks , For example, if the loop A Using a loop B Internally defined values , vice versa .
4. Limitations
Loops that reference multiple dynamic dimensions may consume an unexpected amount of memory .
In a cycle , Memory allocation is like all dynamic dimensions taking the maximum value of any of these dimensions . for example , If a circular reference to two dimensions is [4,x,y] and [6,y] Tensor , Then the memory allocation of these tensors is like their dimension [4,max(x,y),max(x ,y)] and [6,max(x,y)] .
with kLAST_VALUE Of LoopOutputLayer The input of must be IRecurrenceLayer Output .
loop API Support only FP32 and FP16 precision .
5. Replacing IRNNv2Layer With Loops
IRNNv2Layer stay TensorRT 7.2.1 Has abandoned , And will be in TensorRT 9.0 Delete in . Using a loop API Synthetic cyclic subnet . for example , see also sampleCharRNN Method SampleCharRNNLoop::addLSTMCell . loop API It allows you to express the general circular network , Not limited to IRNNLayer and IRNNv2Layer Precast units in .
see also sampleCharRNN .
边栏推荐
- 递归(迷宫问题、8皇后问题)
- The intern left a big hole when he ran away and made two online problems, which made me miserable
- CUDA中的Warp Shuffle
- LeetCode 40. Combined sum II
- eslint配置代码自动格式化
- TensorRT的数据格式定义详解
- 20201025 Visual Studio2019 QT5.14 信号和槽功能的使用
- When requesting resttemplate, set the request header, request parameters, and request body.
- Amazon AWS data Lake Work Pit 1
- Summary of advertisement business bug replay
猜你喜欢
随机推荐
TensorRT的功能
Does the assignment of Boolean types such as tag attribute disabled selected checked not take effect?
pytest(3)parametrize参数化
CUDA与Direct3D 一致性
Idea announced a new default UI, which is too refreshing (including the application link)
奇葩pip install
Sublime Text 配置php编译环境
Summary of WLAN related knowledge points
Codeforces Round #797 (Div. 3) A—E
LeetCode 40. Combined sum II
web自动中利用win32上传附件
IPv6 experiment and summary
Invalid operation: Load into table ‘sources_ orderdata‘ failed. Check ‘stl_ load_ errors‘ system table
automation - Jenkins pipline 执行 nodejs 命令时,提示 node: command not found
20201002 VS 2019 QT5.14 开发的程序打包
Use of Arduino wire Library
sprintf_s的使用方法
深入了解JUC并发(二)并发理论
ShardingSphere-JDBC篇
LeetCode 27. Removing Elements









