当前位置：网站首页>CUDA user object

CUDA user object

2022-07-02 06:27:00 【Little Heshang sweeping the floor】

CUDA User object

CUDA User objects can be used to help manage CUDA The lifecycle of resources used in asynchronous work . especially , This function is for CUDA chart and Stream capture Very useful .

Various resource management schemes and CUDA Diagram incompatible . for example , Consider event based pool or synchronous creation 、 Asynchronous destruction scheme .

// Library API with pool allocation
void libraryWork(cudaStream_t stream) {
    auto &resource = pool.claimTemporaryResource();
    resource.waitOnReadyEventInStream(stream);
    launchWork(stream, resource);
    resource.recordReadyEvent(stream);
}
// Library API with asynchronous resource deletion
void libraryWork(cudaStream_t stream) {
    Resource *resource = new Resource(...);
    launchWork(stream, resource);
    cudaStreamAddCallback(
        stream,
        [](cudaStream_t, cudaError_t, void *resource) {
            delete static_cast<Resource *>(resource);
        },
        resource,
        0);
    // Error handling considerations not shown
}

Non fixed pointers or handles to resources that require indirect or graph updates , And you need to synchronize every time you submit work CPU Code , These schemes are for CUDA It's difficult to figure . If these considerations are hidden from the caller of the Library , And due to the use of disallowed API, They are also not suitable for stream capture . There are various solutions , For example, expose resources to callers . CUDA User objects provide another way .

CUDA The user object associates the user specified destructor callback with the internal reference count , Be similar to C++ shared_ptr. The quotation may be attributed to CPU User code and CUDA Figure all . Please note that , For user owned references , And C++ Smart pointers are different , No objects representing references ; Users must manually track user owned references . A typical use case is to move the only user owned reference to... Immediately after the user object is created CUDA chart .

When a reference is associated with CUDA Picture time ,CUDA Map operations will be automatically managed . Cloned cudaGraph_t Keep source cudaGraph_t A copy of every reference owned , Have the same multiplicity . Instantiated cudaGraphExec_t Keep source cudaGraph_t A copy of each reference in . When cudaGraphExec_t When it is destroyed without synchronization , The reference will remain until execution is complete .

This is an example usage .

cudaGraph_t graph;  // Preexisting graph

Object *object = new Object;  // C++ object with possibly nontrivial destructor
cudaUserObject_t cuObject;
cudaUserObjectCreate(
    &cuObject,
    object,  // Here we use a CUDA-provided template wrapper for this API,
             // which supplies a callback to delete the C++ object pointer
    1,  // Initial refcount
    cudaUserObjectNoDestructorSync  // Acknowledge that the callback cannot be
                                    // waited on via CUDA
);
cudaGraphRetainUserObject(
    graph,
    cuObject,
    1,  // Number of references
    cudaGraphUserObjectMove  // Transfer a reference owned by the caller (do
                             // not modify the total reference count)
);
// No more references owned by this thread; no need to call release API
cudaGraphExec_t graphExec;
cudaGraphInstantiate(&graphExec, graph, nullptr, nullptr, 0);  // Will retain a
                                                               // new reference
cudaGraphDestroy(graph);  // graphExec still owns a reference
cudaGraphLaunch(graphExec, 0);  // Async launch has access to the user objects
cudaGraphExecDestroy(graphExec);  // Launch is not synchronized; the release
                                  // will be deferred if needed
cudaStreamSynchronize(0);  // After the launch is synchronized, the remaining
                           // reference is released and the destructor will
                           // execute. Note this happens asynchronously.
// If the destructor callback had signaled a synchronization object, it would
// be safe to wait on it at this point.

The reference of the graph in the sub graph node is associated with the sub graph , Not associated with the parent graph . If you update or delete the subgraph , The reference will change accordingly . If you use cudaGraphExecUpdate or cudaGraphExecChildGraphNodeSetParams Update executable diagram or subgraph , Will clone the reference in the new source diagram and replace the reference in the target diagram . In any case , If the previous startup is not synchronized , Then any references that will be released will be preserved , Until the startup is completed .

At present, it has not passed CUDA API Mechanism for waiting for user object destructors . The user can manually signal the synchronization object from the destructor code . in addition , Call from the destructor CUDA API It's illegal , It's similar to right cudaLaunchHostFunc The limitation of . This is to avoid blocking CUDA Share threads internally and block progress . If dependency is a way and the thread executing the call cannot prevent CUDA Progress of work , Send execution to another thread API The call signal is legal .

The user object is to use cudaUserObjectCreate Created , This is about browsing API A good starting point for .

原网站

版权声明
本文为[Little Heshang sweeping the floor]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/183/202207020612428105.html

当前位置：网站首页>CUDA user object

CUDA user object

CUDA User object

边栏推荐

猜你喜欢

随机推荐