当前位置:网站首页>2022cuda summer training camp Day1 practice
2022cuda summer training camp Day1 practice
2022-07-29 10:27:00 【Hua Weiyun】
Zhang Xiaobai once successfully ran the first CUDA Program , But I just know what it is and I don't know why . therefore CUDA Training camp is to help you know why .
We put CPU, This area of memory is called “ host (HOST)”, hold GPU, This area of video memory is called “ equipment (DEVICE)”.
CUDA The code execution of includes the following steps :
Briefly , Namely host_to_device-》 stay device Top parallel computing -》device_to_host.
cuda The program is actually a right C Extension program . Its suffix is .cu, If the header file is .cuh.
This .cu Procedure except C Outside the syntax of the program , Some more cuda Unique part of , For example, it prefixes the function , It is divided into __global__, __host__,__device__ Three .
about __global__, That's what the training camp says :
So-called “ Perform configuration ”, As we'll see , For instance, <<< >>> In the middle .
This identifier will be a C The function is declared as a Kernel function . It can only be used on devices (device) On the implementation .
about __host__ That's what it says :
about __device__ That's what it says :
Personal understanding , These prefixes define the device where these codes run , This allows the program to decide which device to run on .
For a simple Hello World In terms of code :
#include <stdio.h>void hello_from_cpu(){ printf("Hello World from the CPU!\n");}int main(void){ hello_from_cpu(); return 0;}
If we want it to be in GPU Up operation , Only two steps are needed :
(1) The function to be called hello_from_cpu Change it to hello_from_gpu , prefix __global__ Define it as a kernel function .
(2) stay main When the main function is called , Plus execution configuration <<< >>> part , If you add <<<1,1>>> Is parallel 1 Time , If you add <<<2,4>>> Then run 2X4 Time .
Let's look at the effect of the actual code modification :
#include <stdio.h>__global__ void hello_from_gpu(){ printf("Hello World from the GPU!\n");}int main(void){ hello_from_gpu<<<1,1>>>(); return 0;}
cu The code must use nvcc compile , Compile according to GPU Fill in different parameters for different architectures .
among ,arch The parameters are as follows :
code The parameters are as follows :
A simple example , The graphics card of Zhang Xiaobai's notebook is Quardo P1000, yes Pascal framework , So the parameter is compute_61 and sm_61.
We execute the following statement :
/usr/local/cuda/bin/nvcc -arch=compute_61 -code=sm_61 hello_cuda.cu -o hello_cuda
./hello_cuda
If the execution configuration is changed to 2,4:
It can be found that this kernel function has been executed 8 Time .
Let's see if this code can be used jupyter lab perform .
Let's decompress Provided by the training camp jupyter Exercise pack :CUDA_on_ARM.zip
decompression :unzip, If you want to implement it, you have to install unzip:
Perform decompression :unzip CUDA_on_ARM.zip
install PIP:
sudo apt install python3-pip
It seems python3 and pip All ready OK 了 .
pip install jupyterlab -i https://pypi.tuna.tsinghua.edu.cn/simple
sudo apt install jupyter
start-up jupyter lab --no-browser have a look :
Wrong report , No problem , Du Niang :https://blog.csdn.net/weixin_45438997/article/details/124261720
pip show markupsafe
python -m pip install markupsafe==2.0.1
Restart :jupyter lab --no-browser
Of course, you can set the password before starting :
jupyter server password
Type twice 123456
Browser open :http://127.0.0.1:8888/lab
so , stay jupyter lab You can also experience CUDA Code .
( To be continued )
边栏推荐
- Follow teacher Li to learn online generation - matrix (continuously updated)
- Read Plato farm's eplato and the reason for its high premium
- 二次握手??三次挥手??
- Modulenotfounderror: no module named 'pywt' solution
- Follow teacher Wu to learn advanced numbers - function, limit and continuity (continuous update)
- English语法_不定代词 - 常用短语
- Solve problems intelligently
- Soft exam summary
- MySQL infrastructure: SQL query statement execution process
- 函数——(C游记)
猜你喜欢
Talk about multithreaded concurrent programming from a different perspective without heap concept
跟着田老师学实用英语语法(持续更新)
Attachment of text of chenjie Report
Implementation of college logistics repair application system based on SSM
汉源高科千兆2光6电导轨式网管型工业级以太网交换机支持X-Ring冗余环网一键环网交换机
【论文阅读】I-BERT: Integer-only BERT Quantization
MySQL optimization theory study guide
Static resource mapping
DW: optimize the training process of target detection and more comprehensive calculation of positive and negative weights | CVPR 2022
ECCV 2022 | CMU提出在视觉Transformer上进行递归,不增参数,计算量还少
随机推荐
Are you familiar with the redis cluster principle of high paid programmers & interview questions series 122? How to ensure the high availability of redis (Part 2): cluster mechanism and principle, clu
Easy to understand and explain the gradient descent method!
Soft exam summary
[paper reading] q-bert: Hessian based ultra low precision quantification of Bert
汉源高科千兆2光6电导轨式网管型工业级以太网交换机支持X-Ring冗余环网一键环网交换机
服务器
NUMA architecture CPU API change summary
The latest translated official pytorch easy introduction tutorial (pytorch version 1.0)
Tell you from my accident: Mastering asynchrony is key
[fortran]vscode配置fortran跑hello world
How beautiful can VIM be configured?
Summary of JD internship written examination questions
Dimensionality reduction and mathematical modeling after reading blog!
造型科幻、标配6安全气囊,风行·游艇11.99万起售
Notes for Resume Writing
The maximum length of VARCHAR2 type in Oracle is_ Oracle modify field length SQL
Comprehensively design an oppe home page -- the bottom of the page
高效能7个习惯学习笔记
[QNX hypervisor 2.2 user manual]7.2.1 hypervisor tracking events
"Focus on machines": Zhu Songchun's team built a two-way value alignment system between people and robots to solve major challenges in the field of human-computer cooperation