当前位置:网站首页>Function execution space specifier in CUDA
Function execution space specifier in CUDA
2022-07-02 06:28:00 【Little Heshang sweeping the floor】
Function execution space specifier
The function execution space specifier indicates whether the function is executed on the host or on the device , And whether it can be called from the host or from the device .
1 __global__
__global__ The execution space specifier declares the function as a kernel . Its function is :
- Execute on the device ,
- Can be called from the host ,
- The computing power is 3.2 Or higher device call ( For more details , see also CUDA Dynamic parallelism ).
__global__Function must have void Return type , And cannot be a member of a class .
Yes __global__ Any call to a function must specify its execution configuration , Such as Perform configuration Described in .
Yes __global__ Function calls are asynchronous , This means that it returns before the device completes execution .
2 __device__
__device__ The execution space specifier declares a function :
- Execute on the device ,
- Can only be called from the device .
__global__and__device__Execution space specifiers cannot be used together .
3 __host__
__host__ The execution space specifier declares a function :
- Execute on the host ,
- Can only be called from the host .
It is equivalent to declaring a function with only__host__Execute space specifier , Or declare that it has no__host__、__device__or__global__Execute space specifier ; In any case , This function is only compiled for the host .
__global__ and __host__ Execution space specifiers cannot be used together .
however , __device__ and __host__ The execution space specifier can be used together , under these circumstances , This function is compiled for hosts and devices . Application Compatibility Introduced in __CUDA_ARCH__ Macros can be used to distinguish code paths between hosts and devices :
__host__ __device__ func()
{
#if __CUDA_ARCH__ >= 800
// Device code path for compute capability 8.x
#elif __CUDA_ARCH__ >= 700
// Device code path for compute capability 7.x
#elif __CUDA_ARCH__ >= 600
// Device code path for compute capability 6.x
#elif __CUDA_ARCH__ >= 500
// Device code path for compute capability 5.x
#elif __CUDA_ARCH__ >= 300
// Device code path for compute capability 3.x
#elif !defined(__CUDA_ARCH__)
// Host code path
#endif
}
边栏推荐
- pytest(2) mark功能
- MySQL的10大经典错误
- 【每日一题】写一个函数,判断一个字符串是否为另外一个字符串旋转之后的字符串。
- CUDA中的Warp matrix functions
- TensorRT的数据格式定义详解
- 深入学习JVM底层(二):HotSpot虚拟机对象
- 提高用户体验 防御性编程
- TensorRT的命令行程序
- Redis——缓存击穿、穿透、雪崩
- Hydration failed because the initial UI does not match what was rendered on the server. One of the reasons for the problem
猜你喜欢

Pbootcms collection and warehousing tutorial quick collection release

Sentinel Alibaba open source traffic protection component

Find the highest value of the current element Z-index of the page

Redis - hot key issues

qq邮箱接收不到jenkins构建后使用email extension 发送的邮件(timestamp 或 auth.......)

Sentinel规则持久化到Nacos

The difference between session and cookies

栈(线性结构)

pytest(1) 用例收集规则

ctf-web之练习赛
随机推荐
深入学习JVM底层(四):类文件结构
web自动中利用win32上传附件
Redis——热点key问题
ModuleNotFoundError: No module named ‘jieba.analyse‘; ‘jieba‘ is not a package
Cglib代理-代码增强测试
实习生跑路留了一个大坑,搞出2个线上问题,我被坑惨了
利用NVIDIA GPU将Minecraft场景渲染成真实场景
Alibaba cloud MFA binding Chrome browser
Log (common log framework)
LeetCode 39. Combined sum
FE - 微信小程序 - 蓝牙 BLE 开发调研与使用
注解和反射详解以及运用
【每日一题】写一个函数,判断一个字符串是否为另外一个字符串旋转之后的字符串。
Tensorrt command line program
Hydration failed because the initial UI does not match what was rendered on the server. One of the reasons for the problem
Use of Arduino wire Library
Detailed explanation of BGP message
Redis - grande question clé
pytest(1) 用例收集规则
Three suggestions for all students who have graduated and will graduate