当前位置:网站首页>Function execution space specifier in CUDA
Function execution space specifier in CUDA
2022-07-02 06:28:00 【Little Heshang sweeping the floor】
Function execution space specifier
The function execution space specifier indicates whether the function is executed on the host or on the device , And whether it can be called from the host or from the device .
1 __global__
__global__
The execution space specifier declares the function as a kernel . Its function is :
- Execute on the device ,
- Can be called from the host ,
- The computing power is 3.2 Or higher device call ( For more details , see also CUDA Dynamic parallelism ).
__global__
Function must have void Return type , And cannot be a member of a class .
Yes __global__
Any call to a function must specify its execution configuration , Such as Perform configuration Described in .
Yes __global__
Function calls are asynchronous , This means that it returns before the device completes execution .
2 __device__
__device__
The execution space specifier declares a function :
- Execute on the device ,
- Can only be called from the device .
__global__
and__device__
Execution space specifiers cannot be used together .
3 __host__
__host__
The execution space specifier declares a function :
- Execute on the host ,
- Can only be called from the host .
It is equivalent to declaring a function with only__host__
Execute space specifier , Or declare that it has no__host__
、__device__
or__global__
Execute space specifier ; In any case , This function is only compiled for the host .
__global__
and __host__
Execution space specifiers cannot be used together .
however , __device__
and __host__
The execution space specifier can be used together , under these circumstances , This function is compiled for hosts and devices . Application Compatibility Introduced in __CUDA_ARCH__
Macros can be used to distinguish code paths between hosts and devices :
__host__ __device__ func()
{
#if __CUDA_ARCH__ >= 800
// Device code path for compute capability 8.x
#elif __CUDA_ARCH__ >= 700
// Device code path for compute capability 7.x
#elif __CUDA_ARCH__ >= 600
// Device code path for compute capability 6.x
#elif __CUDA_ARCH__ >= 500
// Device code path for compute capability 5.x
#elif __CUDA_ARCH__ >= 300
// Device code path for compute capability 3.x
#elif !defined(__CUDA_ARCH__)
// Host code path
#endif
}
边栏推荐
- Does the assignment of Boolean types such as tag attribute disabled selected checked not take effect?
- LeetCode 83. Delete duplicate elements in the sorting linked list
- 注解和反射详解以及运用
- CUDA中内置的Vector类型和变量
- automation - Jenkins pipline 执行 nodejs 命令时,提示 node: command not found
- virtualenv和pipenv安装
- MySQL的10大經典錯誤
- 深入学习JVM底层(二):HotSpot虚拟机对象
- ctf-web之练习赛
- 最新CUDA环境配置(Win10 + CUDA 11.6 + VS2019)
猜你喜欢
随机推荐
RestTemplate请求时设置请求头,请求参数,请求体。
AtCoder Beginner Contest 253 F - Operations on a Matrix // 树状数组
Kotlin - 验证时间格式是否是 yyyy-MM-dd HH:mm:ss
广告业务Bug复盘总结
It is said that Kwai will pay for the Tiktok super fast version of the video? How can you miss this opportunity to collect wool?
实现strStr() II
web自动化切换窗口时报错“list“ object is not callable
Browser principle mind map
数据科学【八】:SVD(一)
Idea announced a new default UI, which is too refreshing (including the application link)
20201025 Visual Studio2019 QT5.14 信号和槽功能的使用
【每日一题】写一个函数,判断一个字符串是否为另外一个字符串旋转之后的字符串。
Redis---1. Data structure characteristics and operation
Detailed definition of tensorrt data format
Pbootcms collection and warehousing tutorial quick collection release
10 erreurs classiques de MySQL
Singleton mode compilation
2020-9-23 QT的定时器Qtimer类的使用。
Android - Kotlin 下使用 Room 遇到 There are multiple good constructors and Room will ... 问题
VLAN experiment of switching technology