Introduction to constant memory
Constant memory is just a virtual address form of global memory , There are no special reserved blocks of constant memory . Constant memory has two features :
- Cache , With cache acceleration ;
- Supports broadcasting a single value to each thread in the thread bundle
The size of constant memory is limited to 64K, Every SM Have 8KB Constant memory cache , Declare a block of constant memory at compile time , Need to use __constant__ keyword , for example
__constant__ float my_array[1024] = { 0.0F, 1.0F, 1.3F, ...};
differ c/c++ Medium const Constant ,cuda It can be modified after there is a declaration in the constant in . If you want to change the contents of constant memory at run time , Just call GPU The kernel simply calls cudaCopyToSymbol function . If constant memory is not defined at compile time or at host side runtime , Then the constant memory area will not be defined .
Host and device constant memory
Use predefined macros __CUDA_ARCH__ To support constant memory replication between host and device , It will be convenient CPU and GPU Reading memory .
__constant__ double dc_vals[2] = {0.0, 1000.0};
const double hc_vals[2] = {0.0, 1000.0};
__device__ __host__ double f(size_t i)
{
#ifdef __CUDA_ARCH__
return dc_vals[i];
#else
return hc_vals[i];
#endif
}
Accessing constant memory
cuda Runtime api
cuda Runtime applications can use functions cudaMemecpyToSymbol() and cudaMemcpyFromSymbol() Copy data to and from constant memory, respectively . Constant memory pointers can use cudaGetSymbolAddress() Function query .
The driver api
The driver api Applications can use functions cuModuleGlobal() Query constant memory device pointer . Because of the driver api barring cuda Runtime language integration features . The driver api It doesn't include things like cudaMemcpyToSymbol() Such a special memory copy function . So you have to use cuModuleGetGlobal() Search address , Then use cuMemcpyHtoD() or cuMemcpyDtoH().
Constant memory use cases
__constant__ char p_HelloCUDA[11];
__constant__ int t_HelloCUDA[11] = {0,1,2,3,4,5,6,7,8,9,10};
__constant__ int num = 11;
__global__ static void HelloCUDA(char * result)
{
int i = 0;
for(i=0; i<num; i++)
{
result[i] = p_HelloCUDA[i] + t_HelloCUDA[i];
}
}
int main(int argc, char* argv[])
{
if(!InitCUDA())
return 0;
char helloCUDA[] = "hdjik CUDA!";
char *device_result = 0;
char host_result[12] = {0};
CUDA_SAFE_CALL(cudaMalloc(void**)&device_result, sizeof(char) * 11);
CUDA_SAFE_CALL(cudaMemcpyToSymbol(p_HelloCUDA, helloCUDA, sizeof(char) * 11));
HelloCUDA<<<1, 1, 0>>>(device_result);
CUT_CHECK_ERROR("Kernel excution failed\n");
CUDA_SAFE_CALL(cudaMemcpy(&host_result, device_result, sizeof(char) * 11, cudaMemcpyDeviceToHost));;
printf("%s\n", host_result);
CUDA_SAFE_CALL(cudaFree(device_result));
CUT_EXIT(argc, argv);
return 0;
}
Notice that when defining constant memory , It needs to be defined outside of all functions , The whole scope of the document is , And it is visible to both host side and device side functions . meanwhile , The above code shows two ways to use constant memory .
- Initialize the constant register directly at definition time , And then in kernel Use it directly inside .
__constant__ int t_HelloCUDA[11] = {0,1,2,3,4,5,6,7,8,9,10};
__constant__ int num = 11;
- Define a constant Array ,( First of all ), Then use the function to assign the value .
__constant__ char p_HelloCUDA[11]; // Statement
...
// Use cudaMemcpyToSymbol Assign a value
CUDA_SAFE_CALL(cudaMemcpyToSymbol(p_HelloCUDA, helloCUDA, sizeof(char) * 11));
...