register
GPU On chip high speed memory , Basic unit is a register file , The size of each register file is 32bit.
Ability to calculate 1.0/1.1 Version hardware , Every SM The number of register files in is 8192; And in the 1.2/1.3 In hardware , Every SM The number of register files in is 16384.
In general , The simple local variables in the kernel thread are all in register memory .
Local memory
For each thread , Local memory is also private , If registers are consumed , The data will be stored in local storage . If each thread uses too many registers , Or declare a large structure or array , Or the compiler can't determine the size of the array , The private data of the thread may be allocated to local memory in . The input and intermediate variables of a thread are stored in registers or local memory . Data in local memory is stored in video memory by data , So right. local memory The access speed of is very slow .
as follows ,mt Will be deposited local memory in
__global__ void localmemDemo(float* A)
{
unsigned int mt[3];
}
If you define a thread private array at the same time , It's initialized , So if the array size is not large , It is still possible that the array can be divided into registers .
__global__ void localmemDemo(float* A)
{
unsigned int mt[3] = {1, 2, 3};
}
At compile time , Output ptx(parallel thread execution) Assembly code ( Add... At compile time -ptx perhaps -keep Options ), You can see if variables are assigned to... In the first phase of compilation local memory in , If a variable is in ptx China and Israel .local Mnemonic statement , You can use ld.local and st.local Access mnemonic , This variable is stored in local memory in . however , Even if the first compiled variable is not in local memory in , In the second stage of compilation, it is still possible to store variables in the local memory in . At this time , By adding --ptxas-options=-v Compile options are used to observe lmem Usage situation .
If the array is small , And it must be allocated in registers , Use the following methods :
__global__ void localmemDemo(float* A)
{
unsigned int mt0, mt1, mt2;
}




