CUDA It has its own memory model , At execution time, the thread will access the data in different memory spaces .cuda Application development involves 8 Medium memory , The hierarchical results are shown in the figure :

Each thread has its own private memory register and local memory ; Each thread block has a shared memory (shared_memory); Last ,grid All threads in can access the same global memory (global memory). besides , There are also two types of read-only memory that can be accessed by all threads : Constant memory (constant memory) And texture memory (Texture memory), They optimized for different applications . Global memory 、 The values in constant memory and texture memory will be maintained after a kernel function is executed , Can be called by other kernel functions in the same program .
Compare all kinds of memory :
| Memory | Location | Having a cache | Access right | Variable life cycle |
|---|---|---|---|---|
| register | GPU Intraslice | N/A | device Can be read / Write | And thread identical |
| local_memory | On board video memory | nothing | device Can be read / Write | And thread identical |
| shared_memory | CPU Intraslice | N/A | device Can be read / Write | And block identical |
| constant_memory | On board video memory | Yes | device Can be read ,host Can be read / Write | Can be maintained in the program |
| texture_memory | On board video memory | Yes | device Can be read ,host Can be read / Write | Can be maintained in the program |
| global_memory | On board video memory | nothing | device Can be read / Write ,host Can be read / Write | Can be maintained in the program |
| host_memory | host Memory | nothing | host Can be read / Write | Can be maintained in the program |
| pinned_memory | host Memory | nothing | host Can be read / Write | Can be maintained in the program |
explain
(1)Register and shared_memory All are GPU High speed memory on a chip ;
(2) adopt mapped memory Realized zero copy function , Some functions GPU Can be directly in kernel Medium visit page-locked memory.
GPU What is performed is a load of memory / The storage model (load-storage model), That is, all operations must be executed after the instruction is loaded into the register , So master GPU The memory model of is for GPU The optimization of code performance is of great significance .







