当前位置:网站首页>Function execution space specifier in CUDA
Function execution space specifier in CUDA
2022-07-02 06:28:00 【Little Heshang sweeping the floor】
Function execution space specifier
The function execution space specifier indicates whether the function is executed on the host or on the device , And whether it can be called from the host or from the device .
1 __global__
__global__
The execution space specifier declares the function as a kernel . Its function is :
- Execute on the device ,
- Can be called from the host ,
- The computing power is 3.2 Or higher device call ( For more details , see also CUDA Dynamic parallelism ).
__global__
Function must have void Return type , And cannot be a member of a class .
Yes __global__
Any call to a function must specify its execution configuration , Such as Perform configuration Described in .
Yes __global__
Function calls are asynchronous , This means that it returns before the device completes execution .
2 __device__
__device__
The execution space specifier declares a function :
- Execute on the device ,
- Can only be called from the device .
__global__
and__device__
Execution space specifiers cannot be used together .
3 __host__
__host__
The execution space specifier declares a function :
- Execute on the host ,
- Can only be called from the host .
It is equivalent to declaring a function with only__host__
Execute space specifier , Or declare that it has no__host__
、__device__
or__global__
Execute space specifier ; In any case , This function is only compiled for the host .
__global__
and __host__
Execution space specifiers cannot be used together .
however , __device__
and __host__
The execution space specifier can be used together , under these circumstances , This function is compiled for hosts and devices . Application Compatibility Introduced in __CUDA_ARCH__
Macros can be used to distinguish code paths between hosts and devices :
__host__ __device__ func()
{
#if __CUDA_ARCH__ >= 800
// Device code path for compute capability 8.x
#elif __CUDA_ARCH__ >= 700
// Device code path for compute capability 7.x
#elif __CUDA_ARCH__ >= 600
// Device code path for compute capability 6.x
#elif __CUDA_ARCH__ >= 500
// Device code path for compute capability 5.x
#elif __CUDA_ARCH__ >= 300
// Device code path for compute capability 3.x
#elif !defined(__CUDA_ARCH__)
// Host code path
#endif
}
边栏推荐
- 实习生跑路留了一个大坑,搞出2个线上问题,我被坑惨了
- 递归(迷宫问题、8皇后问题)
- It is said that Kwai will pay for the Tiktok super fast version of the video? How can you miss this opportunity to collect wool?
- js中正则表达式的使用
- 深入学习JVM底层(三):垃圾回收器与内存分配策略
- BGP routing optimization rules and notification principles
- 实现strStr() II
- 重载全局和成员new/delete
- Linear DP (split)
- LeetCode 39. Combined sum
猜你喜欢
随机推荐
日志 - 7 - 记录一次丢失文件(A4纸)的重大失误
Arduino Wire 库使用
FE - weex 开发 之 使用 weex-ui 组件与配置使用
Tensorrt command line program
virtualenv和pipenv安装
TensorRT的命令行程序
unittest.TextTestRunner不生成txt测试报告
Top 10 classic MySQL errors
广告业务Bug复盘总结
Redis - hot key issues
介绍两款代码自动生成器,帮助提升工作效率
数据科学【九】:SVD(二)
阿里云MFA绑定Chrome浏览器
In depth understanding of JUC concurrency (I) what is JUC
LeetCode 283. Move zero
记录一次RDS故障排除--RDS容量徒增
稀疏数组(非线性结构)
TensorRT的功能
AtCoder Beginner Contest 253 F - Operations on a Matrix // 树状数组
Cglib agent - Code enhancement test