当前位置:网站首页>Function execution space specifier in CUDA
Function execution space specifier in CUDA
2022-07-02 06:28:00 【Little Heshang sweeping the floor】
Function execution space specifier
The function execution space specifier indicates whether the function is executed on the host or on the device , And whether it can be called from the host or from the device .
1 __global__
__global__ The execution space specifier declares the function as a kernel . Its function is :
- Execute on the device ,
- Can be called from the host ,
- The computing power is 3.2 Or higher device call ( For more details , see also CUDA Dynamic parallelism ).
__global__Function must have void Return type , And cannot be a member of a class .
Yes __global__ Any call to a function must specify its execution configuration , Such as Perform configuration Described in .
Yes __global__ Function calls are asynchronous , This means that it returns before the device completes execution .
2 __device__
__device__ The execution space specifier declares a function :
- Execute on the device ,
- Can only be called from the device .
__global__and__device__Execution space specifiers cannot be used together .
3 __host__
__host__ The execution space specifier declares a function :
- Execute on the host ,
- Can only be called from the host .
It is equivalent to declaring a function with only__host__Execute space specifier , Or declare that it has no__host__、__device__or__global__Execute space specifier ; In any case , This function is only compiled for the host .
__global__ and __host__ Execution space specifiers cannot be used together .
however , __device__ and __host__ The execution space specifier can be used together , under these circumstances , This function is compiled for hosts and devices . Application Compatibility Introduced in __CUDA_ARCH__ Macros can be used to distinguish code paths between hosts and devices :
__host__ __device__ func()
{
#if __CUDA_ARCH__ >= 800
// Device code path for compute capability 8.x
#elif __CUDA_ARCH__ >= 700
// Device code path for compute capability 7.x
#elif __CUDA_ARCH__ >= 600
// Device code path for compute capability 6.x
#elif __CUDA_ARCH__ >= 500
// Device code path for compute capability 5.x
#elif __CUDA_ARCH__ >= 300
// Device code path for compute capability 3.x
#elif !defined(__CUDA_ARCH__)
// Host code path
#endif
}
边栏推荐
猜你喜欢

Cglib agent - Code enhancement test

Redis——缓存击穿、穿透、雪崩

构建学习tensorflow

CUDA中的Warp Shuffle

数据科学【九】:SVD(二)

Find the highest value of the current element Z-index of the page

Sudo right raising

In depth understanding of JUC concurrency (I) what is JUC

LeetCode 78. subset

Name six schemes to realize delayed messages at one go
随机推荐
DeprecationWarning: .ix is deprecated. Please use.loc for label based indexing or.iloc for positi
Redis---1. Data structure characteristics and operation
CUDA中的存储空间修饰符
2020-9-23 QT的定时器Qtimer类的使用。
Alibaba cloud MFA binding Chrome browser
【程序员的自我修养]—找工作反思篇二
Singleton mode compilation
Sublime Text 配置php编译环境
广告业务Bug复盘总结
CUDA中的动态全局内存分配和操作
IPv6 experiment and summary
阿里云MFA绑定Chrome浏览器
压力测试修改解决方案
Redis - grande question clé
Name six schemes to realize delayed messages at one go
Data science [viii]: SVD (I)
Redis——热点key问题
数据科学【九】:SVD(二)
利用传统方法(N-gram,HMM等)、神经网络方法(CNN,LSTM等)和预训练方法(Bert等)的中文分词任务实现
深入学习JVM底层(三):垃圾回收器与内存分配策略