当前位置：网站首页>CUDA_ constant memory

CUDA_ constant memory

2020-11-09 22:40:00 【Li Baqian】

Introduction to constant memory

Constant memory is just a virtual address form of global memory , There are no special reserved blocks of constant memory . Constant memory has two features ：

Cache , With cache acceleration ;
Supports broadcasting a single value to each thread in the thread bundle

The size of constant memory is limited to 64K, Every SM Have 8KB Constant memory cache , Declare a block of constant memory at compile time , Need to use __constant__ keyword , for example

__constant__ float my_array[1024] = { 0.0F, 1.0F, 1.3F, ...};

differ c/c++ Medium const Constant ,cuda It can be modified after there is a declaration in the constant in . If you want to change the contents of constant memory at run time , Just call GPU The kernel simply calls cudaCopyToSymbol function . If constant memory is not defined at compile time or at host side runtime , Then the constant memory area will not be defined .

Host and device constant memory

Use predefined macros __CUDA_ARCH__ To support constant memory replication between host and device , It will be convenient CPU and GPU Reading memory .

__constant__ double dc_vals[2] = {0.0, 1000.0};
       const double hc_vals[2] = {0.0, 1000.0};
      
__device__ __host__ double f(size_t i)
{
#ifdef __CUDA_ARCH__
	return dc_vals[i];
#else
	return hc_vals[i];
#endif
}

Accessing constant memory

cuda Runtime api

cuda Runtime applications can use functions cudaMemecpyToSymbol() and cudaMemcpyFromSymbol() Copy data to and from constant memory, respectively . Constant memory pointers can use cudaGetSymbolAddress() Function query .

The driver api

The driver api Applications can use functions cuModuleGlobal() Query constant memory device pointer . Because of the driver api barring cuda Runtime language integration features . The driver api It doesn't include things like cudaMemcpyToSymbol() Such a special memory copy function . So you have to use cuModuleGetGlobal() Search address , Then use cuMemcpyHtoD() or cuMemcpyDtoH().

Constant memory use cases

__constant__ char p_HelloCUDA[11];
__constant__ int t_HelloCUDA[11] = {0,1,2,3,4,5,6,7,8,9,10};
__constant__ int num = 11;

__global__ static void HelloCUDA(char * result)
{
    int i = 0;
    for(i=0; i<num; i++)
    {
        result[i] = p_HelloCUDA[i] + t_HelloCUDA[i];
    }
}

int main(int argc, char* argv[])
{
    if(!InitCUDA())
        return 0;
    
    char helloCUDA[] = "hdjik CUDA!";
    char *device_result = 0;
    char host_result[12] = {0};

    CUDA_SAFE_CALL(cudaMalloc(void**)&device_result, sizeof(char) * 11);
    CUDA_SAFE_CALL(cudaMemcpyToSymbol(p_HelloCUDA, helloCUDA, sizeof(char) * 11));

    HelloCUDA<<<1, 1, 0>>>(device_result);
    CUT_CHECK_ERROR("Kernel excution failed\n");

    CUDA_SAFE_CALL(cudaMemcpy(&host_result, device_result, sizeof(char) * 11, cudaMemcpyDeviceToHost));;
    printf("%s\n", host_result);
    CUDA_SAFE_CALL(cudaFree(device_result));
    CUT_EXIT(argc, argv);
    return 0;
}

Notice that when defining constant memory , It needs to be defined outside of all functions , The whole scope of the document is , And it is visible to both host side and device side functions . meanwhile , The above code shows two ways to use constant memory .

Initialize the constant register directly at definition time , And then in kernel Use it directly inside .

__constant__ int t_HelloCUDA[11] = {0,1,2,3,4,5,6,7,8,9,10};
__constant__ int num = 11;

Define a constant Array ,( First of all ), Then use the function to assign the value .

__constant__ char p_HelloCUDA[11]; //  Statement 
...
//  Use cudaMemcpyToSymbol Assign a value 
CUDA_SAFE_CALL(cudaMemcpyToSymbol(p_HelloCUDA, helloCUDA, sizeof(char) * 11));
...

版权声明
本文为[Li Baqian]所创，转载请带上原文链接，感谢