当前位置:网站首页>Function execution space specifier in CUDA
Function execution space specifier in CUDA
2022-07-02 06:28:00 【Little Heshang sweeping the floor】
Function execution space specifier
The function execution space specifier indicates whether the function is executed on the host or on the device , And whether it can be called from the host or from the device .
1 __global__
__global__
The execution space specifier declares the function as a kernel . Its function is :
- Execute on the device ,
- Can be called from the host ,
- The computing power is 3.2 Or higher device call ( For more details , see also CUDA Dynamic parallelism ).
__global__
Function must have void Return type , And cannot be a member of a class .
Yes __global__
Any call to a function must specify its execution configuration , Such as Perform configuration Described in .
Yes __global__
Function calls are asynchronous , This means that it returns before the device completes execution .
2 __device__
__device__
The execution space specifier declares a function :
- Execute on the device ,
- Can only be called from the device .
__global__
and__device__
Execution space specifiers cannot be used together .
3 __host__
__host__
The execution space specifier declares a function :
- Execute on the host ,
- Can only be called from the host .
It is equivalent to declaring a function with only__host__
Execute space specifier , Or declare that it has no__host__
、__device__
or__global__
Execute space specifier ; In any case , This function is only compiled for the host .
__global__
and __host__
Execution space specifiers cannot be used together .
however , __device__
and __host__
The execution space specifier can be used together , under these circumstances , This function is compiled for hosts and devices . Application Compatibility Introduced in __CUDA_ARCH__
Macros can be used to distinguish code paths between hosts and devices :
__host__ __device__ func()
{
#if __CUDA_ARCH__ >= 800
// Device code path for compute capability 8.x
#elif __CUDA_ARCH__ >= 700
// Device code path for compute capability 7.x
#elif __CUDA_ARCH__ >= 600
// Device code path for compute capability 6.x
#elif __CUDA_ARCH__ >= 500
// Device code path for compute capability 5.x
#elif __CUDA_ARCH__ >= 300
// Device code path for compute capability 3.x
#elif !defined(__CUDA_ARCH__)
// Host code path
#endif
}
边栏推荐
- 分布式事务 :可靠消息最终一致性方案
- 深入了解JUC并发(一)什么是JUC
- New version of dedecms collection and release plug-in tutorial tool
- AtCoder Beginner Contest 253 F - Operations on a Matrix // 树状数组
- CUDA中的动态全局内存分配和操作
- Redis - cluster data distribution algorithm & hash slot
- 压力测试修改解决方案
- 日期时间API详解
- 【每日一题】写一个函数,判断一个字符串是否为另外一个字符串旋转之后的字符串。
- Network related knowledge (Hardware Engineer)
猜你喜欢
Redis - hot key issues
Code skills - Controller Parameter annotation @requestparam
IPv6 experiment and summary
js中正则表达式的使用
Redis - big key problem
Invalid operation: Load into table ‘sources_ orderdata‘ failed. Check ‘stl_ load_ errors‘ system table
LeetCode 78. subset
日志(常用的日志框架)
广告业务Bug复盘总结
深入了解JUC并发(一)什么是JUC
随机推荐
kali最新更新指南
It is said that Kwai will pay for the Tiktok super fast version of the video? How can you miss this opportunity to collect wool?
TensorRT的命令行程序
浅谈三点建议为所有已经毕业和终将毕业的同学
Codeforces Round #797 (Div. 3) A—E
Summary of WLAN related knowledge points
Sentinel 阿里开源流量防护组件
CUDA中内置的Vector类型和变量
深入学习JVM底层(四):类文件结构
Singleton mode compilation
重载全局和成员new/delete
Kotlin - 验证时间格式是否是 yyyy-MM-dd HH:mm:ss
自学table au
unittest.TextTestRunner不生成txt测试报告
Introduce two automatic code generators to help improve work efficiency
ctf三计
Redis - hot key issues
CUDA与Direct3D 一致性
Golang -- map capacity expansion mechanism (including source code)
web自动化切换窗口时报错“list“ object is not callable