当前位置：网站首页>Function execution space specifier in CUDA

Function execution space specifier in CUDA

2022-07-02 06:28:00 【Little Heshang sweeping the floor】

Function execution space specifier

The function execution space specifier indicates whether the function is executed on the host or on the device , And whether it can be called from the host or from the device .

1 global

__global__ The execution space specifier declares the function as a kernel . Its function is ：

Execute on the device ,
Can be called from the host ,
The computing power is 3.2 Or higher device call （ For more details , see also CUDA Dynamic parallelism ）.
__global__ Function must have void Return type , And cannot be a member of a class .

Yes __global__ Any call to a function must specify its execution configuration , Such as Perform configuration Described in .

Yes __global__ Function calls are asynchronous , This means that it returns before the device completes execution .

2 device

__device__ The execution space specifier declares a function ：

Execute on the device ,
Can only be called from the device .
__global__ and __device__ Execution space specifiers cannot be used together .

3 host

__host__ The execution space specifier declares a function ：

Execute on the host ,
Can only be called from the host .
It is equivalent to declaring a function with only __host__ Execute space specifier , Or declare that it has no __host__、__device__ or __global__ Execute space specifier ; In any case , This function is only compiled for the host .

__global__ and __host__ Execution space specifiers cannot be used together .

however , __device__ and __host__ The execution space specifier can be used together , under these circumstances , This function is compiled for hosts and devices . Application Compatibility Introduced in __CUDA_ARCH__ Macros can be used to distinguish code paths between hosts and devices ：

__host__ __device__ func()
{
#if __CUDA_ARCH__ >= 800
   // Device code path for compute capability 8.x
#elif __CUDA_ARCH__ >= 700
   // Device code path for compute capability 7.x
#elif __CUDA_ARCH__ >= 600
   // Device code path for compute capability 6.x
#elif __CUDA_ARCH__ >= 500
   // Device code path for compute capability 5.x
#elif __CUDA_ARCH__ >= 300
   // Device code path for compute capability 3.x
#elif !defined(__CUDA_ARCH__) 
   // Host code path
#endif
}

原网站

版权声明
本文为[Little Heshang sweeping the floor]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/183/202207020612427598.html