当前位置:网站首页>CUDA user object
CUDA user object
2022-07-02 06:27:00 【Little Heshang sweeping the floor】
CUDA User object
CUDA User objects can be used to help manage CUDA The lifecycle of resources used in asynchronous work . especially , This function is for CUDA chart and Stream capture Very useful .
Various resource management schemes and CUDA Diagram incompatible . for example , Consider event based pool or synchronous creation 、 Asynchronous destruction scheme .
// Library API with pool allocation
void libraryWork(cudaStream_t stream) {
auto &resource = pool.claimTemporaryResource();
resource.waitOnReadyEventInStream(stream);
launchWork(stream, resource);
resource.recordReadyEvent(stream);
}
// Library API with asynchronous resource deletion
void libraryWork(cudaStream_t stream) {
Resource *resource = new Resource(...);
launchWork(stream, resource);
cudaStreamAddCallback(
stream,
[](cudaStream_t, cudaError_t, void *resource) {
delete static_cast<Resource *>(resource);
},
resource,
0);
// Error handling considerations not shown
}
Non fixed pointers or handles to resources that require indirect or graph updates , And you need to synchronize every time you submit work CPU Code , These schemes are for CUDA It's difficult to figure . If these considerations are hidden from the caller of the Library , And due to the use of disallowed API, They are also not suitable for stream capture . There are various solutions , For example, expose resources to callers . CUDA User objects provide another way .
CUDA The user object associates the user specified destructor callback with the internal reference count , Be similar to C++ shared_ptr. The quotation may be attributed to CPU User code and CUDA Figure all . Please note that , For user owned references , And C++ Smart pointers are different , No objects representing references ; Users must manually track user owned references . A typical use case is to move the only user owned reference to... Immediately after the user object is created CUDA chart .
When a reference is associated with CUDA Picture time ,CUDA Map operations will be automatically managed . Cloned cudaGraph_t Keep source cudaGraph_t A copy of every reference owned , Have the same multiplicity . Instantiated cudaGraphExec_t Keep source cudaGraph_t A copy of each reference in . When cudaGraphExec_t When it is destroyed without synchronization , The reference will remain until execution is complete .
This is an example usage .
cudaGraph_t graph; // Preexisting graph
Object *object = new Object; // C++ object with possibly nontrivial destructor
cudaUserObject_t cuObject;
cudaUserObjectCreate(
&cuObject,
object, // Here we use a CUDA-provided template wrapper for this API,
// which supplies a callback to delete the C++ object pointer
1, // Initial refcount
cudaUserObjectNoDestructorSync // Acknowledge that the callback cannot be
// waited on via CUDA
);
cudaGraphRetainUserObject(
graph,
cuObject,
1, // Number of references
cudaGraphUserObjectMove // Transfer a reference owned by the caller (do
// not modify the total reference count)
);
// No more references owned by this thread; no need to call release API
cudaGraphExec_t graphExec;
cudaGraphInstantiate(&graphExec, graph, nullptr, nullptr, 0); // Will retain a
// new reference
cudaGraphDestroy(graph); // graphExec still owns a reference
cudaGraphLaunch(graphExec, 0); // Async launch has access to the user objects
cudaGraphExecDestroy(graphExec); // Launch is not synchronized; the release
// will be deferred if needed
cudaStreamSynchronize(0); // After the launch is synchronized, the remaining
// reference is released and the destructor will
// execute. Note this happens asynchronously.
// If the destructor callback had signaled a synchronization object, it would
// be safe to wait on it at this point.
The reference of the graph in the sub graph node is associated with the sub graph , Not associated with the parent graph . If you update or delete the subgraph , The reference will change accordingly . If you use cudaGraphExecUpdate or cudaGraphExecChildGraphNodeSetParams Update executable diagram or subgraph , Will clone the reference in the new source diagram and replace the reference in the target diagram . In any case , If the previous startup is not synchronized , Then any references that will be released will be preserved , Until the startup is completed .
At present, it has not passed CUDA API Mechanism for waiting for user object destructors . The user can manually signal the synchronization object from the destructor code . in addition , Call from the destructor CUDA API It's illegal , It's similar to right cudaLaunchHostFunc The limitation of . This is to avoid blocking CUDA Share threads internally and block progress . If dependency is a way and the thread executing the call cannot prevent CUDA Progress of work , Send execution to another thread API The call signal is legal .
The user object is to use cudaUserObjectCreate Created , This is about browsing API A good starting point for .
边栏推荐
- 浅谈三点建议为所有已经毕业和终将毕业的同学
- Summary of WLAN related knowledge points
- LeetCode 47. Full arrangement II
- Network related knowledge (Hardware Engineer)
- Singleton mode compilation
- LeetCode 83. Delete duplicate elements in the sorting linked list
- Decryption skills of encrypted compressed files
- 一口气说出 6 种实现延时消息的方案
- LeetCode 47. 全排列 II
- Sudo right raising
猜你喜欢

Find the highest value of the current element Z-index of the page
![Data science [9]: SVD (2)](/img/2c/f1a8c3ff34ff3f3cc6e26157a32bfd.png)
Data science [9]: SVD (2)

Sparse array (nonlinear structure)

TensorRT的数据格式定义详解

Introduce two automatic code generators to help improve work efficiency

Invalid operation: Load into table ‘sources_ orderdata‘ failed. Check ‘stl_ load_ errors‘ system table

Hydration failed because the initial UI does not match what was rendered on the server.问题原因之一

Redis——大Key问题

Sentinel rules persist to Nacos

链表(线性结构)
随机推荐
LeetCode 283. 移动零
Golang -- map capacity expansion mechanism (including source code)
最新CUDA环境配置(Win10 + CUDA 11.6 + VS2019)
The Chinese word segmentation task is realized by using traditional methods (n-gram, HMM, etc.), neural network methods (CNN, LSTM, etc.) and pre training methods (Bert, etc.)
自学table au
压力测试修改解决方案
标签属性disabled selected checked等布尔类型赋值不生效?
CUDA中的函数执行空间说明符
ShardingSphere-JDBC篇
Mech 3002 explanation
浅谈三点建议为所有已经毕业和终将毕业的同学
【每日一题】写一个函数,判断一个字符串是否为另外一个字符串旋转之后的字符串。
LeetCode 40. Combined sum II
深入了解JUC并发(一)什么是JUC
Redis---1.数据结构特点与操作
LeetCode 40. 组合总和 II
Singleton mode compilation
10 erreurs classiques de MySQL
Sudo right raising
Sentinel规则持久化到Nacos