当前位置:网站首页>2022cuda summer training camp Day1 practice
2022cuda summer training camp Day1 practice
2022-07-29 10:27:00 【Hua Weiyun】

Zhang Xiaobai once successfully ran the first CUDA Program , But I just know what it is and I don't know why . therefore CUDA Training camp is to help you know why .
We put CPU, This area of memory is called “ host (HOST)”, hold GPU, This area of video memory is called “ equipment (DEVICE)”.
CUDA The code execution of includes the following steps :



Briefly , Namely host_to_device-》 stay device Top parallel computing -》device_to_host.
cuda The program is actually a right C Extension program . Its suffix is .cu, If the header file is .cuh.
This .cu Procedure except C Outside the syntax of the program , Some more cuda Unique part of , For example, it prefixes the function , It is divided into __global__, __host__,__device__ Three .
about __global__, That's what the training camp says :

So-called “ Perform configuration ”, As we'll see , For instance, <<< >>> In the middle .
This identifier will be a C The function is declared as a Kernel function . It can only be used on devices (device) On the implementation .
about __host__ That's what it says :

about __device__ That's what it says :

Personal understanding , These prefixes define the device where these codes run , This allows the program to decide which device to run on .
For a simple Hello World In terms of code :
#include <stdio.h>void hello_from_cpu(){ printf("Hello World from the CPU!\n");}int main(void){ hello_from_cpu(); return 0;}If we want it to be in GPU Up operation , Only two steps are needed :
(1) The function to be called hello_from_cpu Change it to hello_from_gpu , prefix __global__ Define it as a kernel function .
(2) stay main When the main function is called , Plus execution configuration <<< >>> part , If you add <<<1,1>>> Is parallel 1 Time , If you add <<<2,4>>> Then run 2X4 Time .
Let's look at the effect of the actual code modification :
#include <stdio.h>__global__ void hello_from_gpu(){ printf("Hello World from the GPU!\n");}int main(void){ hello_from_gpu<<<1,1>>>(); return 0;}cu The code must use nvcc compile , Compile according to GPU Fill in different parameters for different architectures .

among ,arch The parameters are as follows :

code The parameters are as follows :

A simple example , The graphics card of Zhang Xiaobai's notebook is Quardo P1000, yes Pascal framework , So the parameter is compute_61 and sm_61.
We execute the following statement :
/usr/local/cuda/bin/nvcc -arch=compute_61 -code=sm_61 hello_cuda.cu -o hello_cuda
./hello_cuda

If the execution configuration is changed to 2,4:

It can be found that this kernel function has been executed 8 Time .
Let's see if this code can be used jupyter lab perform .
Let's decompress Provided by the training camp jupyter Exercise pack :CUDA_on_ARM.zip
decompression :unzip, If you want to implement it, you have to install unzip:

Perform decompression :unzip CUDA_on_ARM.zip

install PIP:
sudo apt install python3-pip



It seems python3 and pip All ready OK 了 .
pip install jupyterlab -i https://pypi.tuna.tsinghua.edu.cn/simple


sudo apt install jupyter


start-up jupyter lab --no-browser have a look :

Wrong report , No problem , Du Niang :https://blog.csdn.net/weixin_45438997/article/details/124261720
pip show markupsafe

python -m pip install markupsafe==2.0.1

Restart :jupyter lab --no-browser

Of course, you can set the password before starting :
jupyter server password
Type twice 123456

Browser open :http://127.0.0.1:8888/lab


so , stay jupyter lab You can also experience CUDA Code .
( To be continued )
边栏推荐
- Docker安装Redis、配置及远程连接
- Read Plato farm's eplato and the reason for its high premium
- Shell notes (super complete)
- PAHO cross compilation
- [semantic segmentation] 2021-pvt iccv
- Mongodb aggregation statistics
- [QNX hypervisor 2.2 user manual]7.2.1 hypervisor tracking events
- HTB-AdmirerToo
- How to integrate Google APIs with Google's application system (3) -- call the restful service of Google discovery API
- [paper reading] q-bert: Hessian based ultra low precision quantification of Bert
猜你喜欢

2018-UperNet ECCV

English grammar_ Indefinite pronouns - Common Phrases

After eating Alibaba's core notes of highly concurrent programming, the backhand rose 5K

Big cloud service company executives changed: technology gives way to sales
![[semantic segmentation] 2021-pvt iccv](/img/43/3756c0dbc30fa2871dc8cae5be9bce.png)
[semantic segmentation] 2021-pvt iccv

MySQL 8 of relational database -- deepening and comprehensive learning from the inside out

MySQL优化理论学习指南

12代酷睿处理器+2.8K OLED华硕好屏,灵耀14 2022影青釉商务轻薄本

Follow teacher Wu to learn advanced numbers - function, limit and continuity (continuous update)

根据给定字符数和字符,打印输出“沙漏”和剩余数
随机推荐
Correct posture and landing practice of R & D efficiency measurement (speech ppt sharing version)
Easy to understand and explain the gradient descent method!
Modulenotfounderror: no module named 'pywt' solution
The maximum length of VARCHAR2 type in Oracle is_ Oracle modify field length SQL
[jetson][reprint]pycharm installed on Jetson
Sed, regular expression of shell programming
Follow teacher Li to learn online generation - matrix (continuously updated)
The server
The function of that sentence
“为机器立心”:朱松纯团队搭建人与机器人的价值双向对齐系统,解决人机协作领域的重大挑战
Mongodb aggregation statistics
Several common design methods of test cases [easy to understand]
Enterprise architecture | togaf architecture capability framework
这才是开发者神器正确的打开方式
[wechat applet] interface generates customized homepage QR code
What happens when MySQL tables change from compressed tables to ordinary tables
How big is the bandwidth of the Tiktok server for hundreds of millions of people to brush at the same time?
leetcode刷题——排序
Be tolerant and generous
网络图片转换本地图片 - 默认值或快捷键