当前位置:网站首页>Function execution space specifier in CUDA
Function execution space specifier in CUDA
2022-07-02 06:28:00 【Little Heshang sweeping the floor】
Function execution space specifier
The function execution space specifier indicates whether the function is executed on the host or on the device , And whether it can be called from the host or from the device .
1 __global__
__global__ The execution space specifier declares the function as a kernel . Its function is :
- Execute on the device ,
- Can be called from the host ,
- The computing power is 3.2 Or higher device call ( For more details , see also CUDA Dynamic parallelism ).
__global__Function must have void Return type , And cannot be a member of a class .
Yes __global__ Any call to a function must specify its execution configuration , Such as Perform configuration Described in .
Yes __global__ Function calls are asynchronous , This means that it returns before the device completes execution .
2 __device__
__device__ The execution space specifier declares a function :
- Execute on the device ,
- Can only be called from the device .
__global__and__device__Execution space specifiers cannot be used together .
3 __host__
__host__ The execution space specifier declares a function :
- Execute on the host ,
- Can only be called from the host .
It is equivalent to declaring a function with only__host__Execute space specifier , Or declare that it has no__host__、__device__or__global__Execute space specifier ; In any case , This function is only compiled for the host .
__global__ and __host__ Execution space specifiers cannot be used together .
however , __device__ and __host__ The execution space specifier can be used together , under these circumstances , This function is compiled for hosts and devices . Application Compatibility Introduced in __CUDA_ARCH__ Macros can be used to distinguish code paths between hosts and devices :
__host__ __device__ func()
{
#if __CUDA_ARCH__ >= 800
// Device code path for compute capability 8.x
#elif __CUDA_ARCH__ >= 700
// Device code path for compute capability 7.x
#elif __CUDA_ARCH__ >= 600
// Device code path for compute capability 6.x
#elif __CUDA_ARCH__ >= 500
// Device code path for compute capability 5.x
#elif __CUDA_ARCH__ >= 300
// Device code path for compute capability 3.x
#elif !defined(__CUDA_ARCH__)
// Host code path
#endif
}
边栏推荐
猜你喜欢

分布式事务 :可靠消息最终一致性方案

最新CUDA环境配置(Win10 + CUDA 11.6 + VS2019)

Browser principle mind map

pytest(2) mark功能

Learn about various joins in SQL and their differences
![Data science [viii]: SVD (I)](/img/cb/7bf066a656d49666985a865c3a1456.png)
Data science [viii]: SVD (I)

实习生跑路留了一个大坑,搞出2个线上问题,我被坑惨了

Redis——Cluster数据分布算法&哈希槽

Idea announced a new default UI, which is too refreshing (including the application link)

Network related knowledge (Hardware Engineer)
随机推荐
Redis——热点key问题
Golang -- map capacity expansion mechanism (including source code)
Sentinel 阿里开源流量防护组件
数据科学【九】:SVD(二)
ModuleNotFoundError: No module named ‘jieba.analyse‘; ‘jieba‘ is not a package
最新CUDA环境配置(Win10 + CUDA 11.6 + VS2019)
2020-9-23 QT的定时器Qtimer类的使用。
深入学习JVM底层(四):类文件结构
FE - Weex 使用简单封装数据加载插件为全局加载方法
FE - 微信小程序 - 蓝牙 BLE 开发调研与使用
队列(线性结构)
Redis---1. Data structure characteristics and operation
pytest(2) mark功能
提高用户体验 防御性编程
LeetCode 27. Removing Elements
Redis——Cluster数据分布算法&哈希槽
重载全局和成员new/delete
Shardingsphere JDBC
CUDA user object
sprintf_s的使用方法