当前位置：网站首页>Latest CUDA environment configuration (win10 + CUDA 11.6 + vs2019)

Latest CUDA environment configuration (win10 + CUDA 11.6 + vs2019)

2022-07-02 06:29:00 【Little Heshang sweeping the floor】

newest CUDA Environment configuration (Win10 + CUDA 11.6 + VS2019)

This blog is based on NVIDIA According to the official documents , And according to their own practice . For your friends in need .

1. Preface

The software environment of this article is :

Windows 10
CUDA 11.6
VS2019

CUDA Is currently doing artificial intelligence , Necessary tool library for deep learning and other directions . from CUDA There are many derived acceleration tools , Such as : cuDNN, TensorRT, cuBLAS etc. HPC Acceleration Library , Or involving the latest concept of the meta universe Omniverse etc. .

In many cases , A lot of NVIDIA The underlying acceleration schemes of the acceleration library are CUDA. Most of the time, we may not directly use CUDA Write code , But understand CUDA How to operate or the basic concept will definitely make you stronger .

If you are interested, you can also check my official translation CUDA Programming manual , I hope that helps .
https://blog.csdn.net/kunhe0512/category_11774233.html

2.VS 2019

about VS, I actually use very little . I was brought in by my boss from the beginning Vim + Makefile Environment . Let's just mention it here .

At present, I use VS2019, You can download the required version according to your needs , Here is the link address .
https://visualstudio.microsoft.com/zh-hans/vs/

Insert picture description here

What needs to be mentioned here is , Try to use C++ Desktop development options . Subsequent in use CMake It will be more convenient when .

3.CUDA download

CUDA Official installation tutorial :https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html

CUDA Toolkit The download :https://developer.nvidia.com/cuda-downloads

Click the above download page , You'll see :
Insert picture description here

You need to choose your own operating system , System architecture , System version and Installation mode .

When the selection is complete , Click on Download(2.5 GB) The button , You can download .

Of course , If you want to install the historical version ( Earlier versions ), Then you can also click the following Archive of Previous CUDA Releases To download , The operation method is the same as above .

Once you download it , You can see the following icons :

Insert picture description here

4. Installation configuration

Double click the... You downloaded CUDA Toolkit, You will see the unpacking path of the toolkit ( Recommended default ).

Insert picture description here

After decompression , It's time to install , Next click agree and continue :
Insert picture description here

Next, set the installation options , It is recommended to click Customize ( Especially the first installation ):

Insert picture description here

Then choose whatever you can , Many things may not be available to you at first ( such as nsight System ), But when you do more and more , It may be used when it is more and more involved .
Insert picture description here

Then choose the installation path , Default is also recommended here , After all, it is the underlying call library

Insert picture description here

Next, you don't need to operate , until CUDA Toolkit Installation completed .

5. environment variable

Right click on my computer ( This computer ) --> attribute --> Advanced system setup --> environment variable , see CUDA Whether the path is already in the system , If you don't remember to add .

Insert picture description here

6. test CUDA Is the installation successful

utilize (Win + R)–>cmd, Open the system terminal command line , Input

nvcc -V

If you see the following results , Prove your CUDA Installed .
Insert picture description here

7. utilize Visual Studio 2019 Conduct CUDA Application development

Open the installed VS 2019, Choose to create a new project :
Insert picture description here

choice CUDA 11.xx Runtime, there xx Represents your version .

Insert picture description here

Give you the CUDA Name the program : Matrix_transpose
The name here is casually named , Because I will write an example of matrix transpose later , That's why Matrix_transpose The name

Insert picture description here

After creating , You will find that there is already some code in it , That's an example of vector addition . You don't have to worry about him , hold kernel.cu Delete the code in , You can start your own development .

Insert picture description here

You can try entering the code , Complete an example of matrix transpose :

#include <stdio.h>
#include <stdlib.h>
#include "error.cuh"

#define TILE_DIM 32   //Don't ask me why I don't set these two values to one
#define BLOCK_SIZE 32
#define N 3001 // for huanhuan, you know that!

__managed__ int input_M[N * N];      //input matrix & GPU result
int cpu_result[N * N];   //CPU result


//in-place matrix transpose
__global__ void ip_transpose(int* data)
{
    __shared__ int tile_s[TILE_DIM ][TILE_DIM + 1];
    __shared__ int tile_d[TILE_DIM ][TILE_DIM + 1];

    int x = blockIdx.x * TILE_DIM + threadIdx.x;
    int y = blockIdx.y * TILE_DIM + threadIdx.y;

    //Threads in the triangle below
    if (blockIdx.y > blockIdx.x) {
        int dx = blockIdx.y * TILE_DIM + threadIdx.x;
        int dy = blockIdx.x * TILE_DIM + threadIdx.y;
        if (x < N && y < N)
        {
            tile_s[threadIdx.y][threadIdx.x] = data[(y)*N + x];
        }
        if (dx < N && dy < N)
        {
            tile_d[threadIdx.y][threadIdx.x] = data[(dy)*N + dx];
        }

        __syncthreads();
        if (dx < N && dy < N)
        {
            data[(dy)*N + dx] = tile_s[threadIdx.x][threadIdx.y];
        }
        if (x < N && y < N)
        {
            data[(y)*N + x] = tile_d[threadIdx.x][threadIdx.y];
        }
    }
    else if (blockIdx.y == blockIdx.x)//Threads on the diagonal
    {
        if (x < N && y < N)
        {
            tile_s[threadIdx.y][threadIdx.x] = data[(y)*N + x];
        }
        __syncthreads();
        if (x < N && y < N)
        {
            data[(y)*N + x] = tile_s[threadIdx.x][threadIdx.y];
        }
    }
}

void cpu_transpose(int* A, int* B)
{
    for (int j = 0; j < N; j++)
    {
        for (int i = 0; i < N; i++)
        {
            B[i * N + j] = A[j * N + i];
        }
    }
}

int main(int argc, char const* argv[])
{

    cudaEvent_t start, stop_gpu;
    CHECK(cudaEventCreate(&start));
    CHECK(cudaEventCreate(&stop_gpu));


    for (int i = 0; i < N; ++i) {
        for (int j = 0; j < N; ++j) {
            input_M[i * N + j] = rand() % 1000;
        }
    }
    cpu_transpose(input_M, cpu_result);

    CHECK(cudaEventRecord(start));
    unsigned int grid_rows = (N + BLOCK_SIZE - 1) / BLOCK_SIZE;
    unsigned int grid_cols = (N + BLOCK_SIZE - 1) / BLOCK_SIZE;
    dim3 dimGrid(grid_cols, grid_rows);
    dim3 dimBlock(BLOCK_SIZE, BLOCK_SIZE);
    ip_transpose << <dimGrid, dimBlock >> > (input_M);
    CHECK(cudaDeviceSynchronize());
    CHECK(cudaEventRecord(stop_gpu));
    CHECK(cudaEventSynchronize(stop_gpu));

    float elapsed_time_gpu;
    CHECK(cudaEventElapsedTime(&elapsed_time_gpu, start, stop_gpu));
    printf("Time_GPU = %g ms.\n", elapsed_time_gpu);

    CHECK(cudaEventDestroy(start));
    CHECK(cudaEventDestroy(stop_gpu));

    int ok = 1;
    for (int i = 0; i < N; ++i)
    {
        for (int j = 0; j < N; ++j)
        {
            if (fabs(input_M[i * N + j] - cpu_result[i * N + j]) > (1.0e-10))
            {
                ok = 0;
            }
        }
    }


    if (ok)
    {
        printf("Pass!!!\n");
    }
    else
    {
        printf("Error!!!\n");
    }

    return 0;
}

After clicking run , You can see the following results :

Insert picture description here