当前位置:网站首页>Latest CUDA environment configuration (win10 + CUDA 11.6 + vs2019)
Latest CUDA environment configuration (win10 + CUDA 11.6 + vs2019)
2022-07-02 06:29:00 【Little Heshang sweeping the floor】
newest CUDA Environment configuration (Win10 + CUDA 11.6 + VS2019)
This blog is based on NVIDIA According to the official documents , And according to their own practice . For your friends in need .
1. Preface
The software environment of this article is :
- Windows 10
- CUDA 11.6
- VS2019
CUDA Is currently doing artificial intelligence , Necessary tool library for deep learning and other directions . from CUDA There are many derived acceleration tools , Such as : cuDNN, TensorRT, cuBLAS etc. HPC Acceleration Library , Or involving the latest concept of the meta universe Omniverse etc. .
In many cases , A lot of NVIDIA The underlying acceleration schemes of the acceleration library are CUDA. Most of the time, we may not directly use CUDA Write code , But understand CUDA How to operate or the basic concept will definitely make you stronger .
If you are interested, you can also check my official translation CUDA Programming manual , I hope that helps .
https://blog.csdn.net/kunhe0512/category_11774233.html
2.VS 2019
about VS, I actually use very little . I was brought in by my boss from the beginning Vim + Makefile Environment . Let's just mention it here .
At present, I use VS2019, You can download the required version according to your needs , Here is the link address .
https://visualstudio.microsoft.com/zh-hans/vs/
What needs to be mentioned here is , Try to use C++ Desktop development options . Subsequent in use CMake It will be more convenient when .
3.CUDA download
CUDA Official installation tutorial :https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html
CUDA Toolkit The download :https://developer.nvidia.com/cuda-downloads
Click the above download page , You'll see :
You need to choose your own operating system , System architecture , System version and Installation mode .
When the selection is complete , Click on Download(2.5 GB) The button , You can download .
Of course , If you want to install the historical version ( Earlier versions ), Then you can also click the following Archive of Previous CUDA Releases To download , The operation method is the same as above .
Once you download it , You can see the following icons :
4. Installation configuration
Double click the... You downloaded CUDA Toolkit, You will see the unpacking path of the toolkit ( Recommended default ).
After decompression , It's time to install , Next click agree and continue :
Next, set the installation options , It is recommended to click Customize ( Especially the first installation ):
Then choose whatever you can , Many things may not be available to you at first ( such as nsight System ), But when you do more and more , It may be used when it is more and more involved .
Then choose the installation path , Default is also recommended here , After all, it is the underlying call library
Next, you don't need to operate , until CUDA Toolkit Installation completed .
5. environment variable
Right click on my computer ( This computer ) --> attribute --> Advanced system setup --> environment variable , see CUDA Whether the path is already in the system , If you don't remember to add .
6. test CUDA Is the installation successful
utilize (Win + R)–>cmd, Open the system terminal command line , Input
nvcc -V
If you see the following results , Prove your CUDA Installed .
7. utilize Visual Studio 2019 Conduct CUDA Application development
Open the installed VS 2019, Choose to create a new project :
choice CUDA 11.xx Runtime, there xx Represents your version .
Give you the CUDA Name the program : Matrix_transpose
The name here is casually named , Because I will write an example of matrix transpose later , That's why Matrix_transpose The name
After creating , You will find that there is already some code in it , That's an example of vector addition . You don't have to worry about him , hold kernel.cu Delete the code in , You can start your own development .
You can try entering the code , Complete an example of matrix transpose :
#include <stdio.h>
#include <stdlib.h>
#include "error.cuh"
#define TILE_DIM 32 //Don't ask me why I don't set these two values to one
#define BLOCK_SIZE 32
#define N 3001 // for huanhuan, you know that!
__managed__ int input_M[N * N]; //input matrix & GPU result
int cpu_result[N * N]; //CPU result
//in-place matrix transpose
__global__ void ip_transpose(int* data)
{
__shared__ int tile_s[TILE_DIM ][TILE_DIM + 1];
__shared__ int tile_d[TILE_DIM ][TILE_DIM + 1];
int x = blockIdx.x * TILE_DIM + threadIdx.x;
int y = blockIdx.y * TILE_DIM + threadIdx.y;
//Threads in the triangle below
if (blockIdx.y > blockIdx.x) {
int dx = blockIdx.y * TILE_DIM + threadIdx.x;
int dy = blockIdx.x * TILE_DIM + threadIdx.y;
if (x < N && y < N)
{
tile_s[threadIdx.y][threadIdx.x] = data[(y)*N + x];
}
if (dx < N && dy < N)
{
tile_d[threadIdx.y][threadIdx.x] = data[(dy)*N + dx];
}
__syncthreads();
if (dx < N && dy < N)
{
data[(dy)*N + dx] = tile_s[threadIdx.x][threadIdx.y];
}
if (x < N && y < N)
{
data[(y)*N + x] = tile_d[threadIdx.x][threadIdx.y];
}
}
else if (blockIdx.y == blockIdx.x)//Threads on the diagonal
{
if (x < N && y < N)
{
tile_s[threadIdx.y][threadIdx.x] = data[(y)*N + x];
}
__syncthreads();
if (x < N && y < N)
{
data[(y)*N + x] = tile_s[threadIdx.x][threadIdx.y];
}
}
}
void cpu_transpose(int* A, int* B)
{
for (int j = 0; j < N; j++)
{
for (int i = 0; i < N; i++)
{
B[i * N + j] = A[j * N + i];
}
}
}
int main(int argc, char const* argv[])
{
cudaEvent_t start, stop_gpu;
CHECK(cudaEventCreate(&start));
CHECK(cudaEventCreate(&stop_gpu));
for (int i = 0; i < N; ++i) {
for (int j = 0; j < N; ++j) {
input_M[i * N + j] = rand() % 1000;
}
}
cpu_transpose(input_M, cpu_result);
CHECK(cudaEventRecord(start));
unsigned int grid_rows = (N + BLOCK_SIZE - 1) / BLOCK_SIZE;
unsigned int grid_cols = (N + BLOCK_SIZE - 1) / BLOCK_SIZE;
dim3 dimGrid(grid_cols, grid_rows);
dim3 dimBlock(BLOCK_SIZE, BLOCK_SIZE);
ip_transpose << <dimGrid, dimBlock >> > (input_M);
CHECK(cudaDeviceSynchronize());
CHECK(cudaEventRecord(stop_gpu));
CHECK(cudaEventSynchronize(stop_gpu));
float elapsed_time_gpu;
CHECK(cudaEventElapsedTime(&elapsed_time_gpu, start, stop_gpu));
printf("Time_GPU = %g ms.\n", elapsed_time_gpu);
CHECK(cudaEventDestroy(start));
CHECK(cudaEventDestroy(stop_gpu));
int ok = 1;
for (int i = 0; i < N; ++i)
{
for (int j = 0; j < N; ++j)
{
if (fabs(input_M[i * N + j] - cpu_result[i * N + j]) > (1.0e-10))
{
ok = 0;
}
}
}
if (ok)
{
printf("Pass!!!\n");
}
else
{
printf("Error!!!\n");
}
return 0;
}
After clicking run , You can see the following results :
OK, Here you are... Done CUDA Construction of environment , And wrote the first CUDA Program
边栏推荐
- Alibaba cloud MFA binding Chrome browser
- 日志(常用的日志框架)
- Codeforces Round #797 (Div. 3) A—E
- Hydration failed because the initial UI does not match what was rendered on the server. One of the reasons for the problem
- 阿里云MFA绑定Chrome浏览器
- js中正则表达式的使用
- Redis - big key problem
- Amazon AWS data Lake Work Pit 1
- Idea announced a new default UI, which is too refreshing (including the application link)
- 【程序员的自我修养]—找工作反思篇二
猜你喜欢
Redis - cluster data distribution algorithm & hash slot
In depth understanding of JUC concurrency (I) what is JUC
日期时间API详解
IDEA公布全新默认UI,太清爽了(内含申请链接)
Idea announced a new default UI, which is too refreshing (including the application link)
TensorRT的数据格式定义详解
VLAN experiment of switching technology
深入学习JVM底层(四):类文件结构
Redis——热点key问题
自学table au
随机推荐
CUDA中的Warp matrix functions
When requesting resttemplate, set the request header, request parameters, and request body.
程序员的自我修养—找工作反思篇
Idea announced a new default UI, which is too refreshing (including the application link)
广告业务Bug复盘总结
NodeJs - Express 中间件修改 Header: TypeError [ERR_INVALID_CHAR]: Invalid character in header content
Android - Kotlin 下使用 Room 遇到 There are multiple good constructors and Room will ... 问题
selenium备忘录:selenium\webdriver\remote\remote_connection.py:374: ResourceWarning: unclosed<xxxx>解决办法
CUDA中的Warp Shuffle
Redis——大Key问题
【程序员的自我修养]—找工作反思篇二
提高用户体验 防御性编程
web自动中利用win32上传附件
Name six schemes to realize delayed messages at one go
In depth understanding of JUC concurrency (I) what is JUC
automation - Jenkins pipline 执行 nodejs 命令时,提示 node: command not found
Use of Arduino wire Library
CUDA中的函数执行空间说明符
深入了解JUC并发(二)并发理论
It is said that Kwai will pay for the Tiktok super fast version of the video? How can you miss this opportunity to collect wool?