当前位置：网站首页>Explanation and explanation on the situation that the volume GPU util (GPU utilization) is very low and the memory ueage (memory occupation) is very high during the training of pytoch

Explanation and explanation on the situation that the volume GPU util (GPU utilization) is very low and the memory ueage (memory occupation) is very high during the training of pytoch

2022-06-12 07:53:00 【Wait for Godot.】

of Pytorch During training GPU Utilization is very low , In the case of a high proportion of memory

Preface
of GPU Of Memory-usage The occupation of （GPU Memory occupancy ）
of Volatile GPU-Utile Utilization ratio （GPU Utilization ratio ）

Preface

When the model starts training , Commonly used watch -n 0.1 nvidia-smi To observe GPU The proportion of video memory , As shown in the figure below , Usually GPU Ratio of video memory and GPU The utilization rate is very high , But some model training GPU utilization （Volatile GPU-util） The proportion is changing dynamically , from 0-100% Between constant floating cycles .

If this happens GPU Utilization is changing , You can further enter... From the command line Top Command to query CPU Utilization ratio , You can find the problem .

of GPU Of Memory-usage The occupation of （GPU Memory occupancy ）

GPU in Memory-usage The most direct influencing factor is The size of the model and Batch size Size . Where model pair GPU in Memory-usage Factors include the network parameters （ The depth of the network , Width etc. ）, Generally, the model structure has been fixed during training , Very few changes can be made easily . therefore , We are right. Memory-usage The impact of occupation is mainly regulated in Batch size Size , Such as batch size Set to 12,Memory-usage by 40%; And set to 24 comparison ,Memory-usage The memory usage is 80%, Close to the 2 Times relationship , The deviation is small . So when the model structure is fixed , As far as possible will batch size Set big , make the best of GPU Of memory .（GPU Will quickly calculate the data you give in , The main bottleneck of training time is CPU The data throughput of .）

of Volatile GPU-Utile Utilization ratio （GPU Utilization ratio ）

This is Volatile GPU-Util Express , When not set CPU Number of threads , This parameter is beating repeatedly ,0%,20%,70%,95%,0%. Stop like this 1-2 Seconds and then repeat . It's actually GPU Waiting for data from CPU Transmit it , When transmitted from the bus to GPU after ,GPU Gradually come to count , Utilization will suddenly rise , however GPU It's very powerful ,0.5 The data can be processed in seconds , So the utilization rate will drop again , Wait for the next batch The introduction of . therefore , This GPU Utilization bottlenecks lie in memory bandwidth, memory media, and CPU Performance above . Of course, the best thing is to change to a better fourth generation or more powerful memory module , Better coordination CPU.

Another way is to , stay PyTorch In this framework , Data loading Dataloader Make changes and optimizations on , Include num_workers（ Number of threads ）,pin_memory=True, Will increase speed . Solve the bandwidth bottleneck and problem of data transmission GPU The problem of low computational efficiency . stay TensorFlow below , There are also settings for loading data

In order to improve utilization , First of all to num_workers（ Number of threads ） Set appropriately ,4,8,16 Are several frequently selected parameters . I have tested , take num_workers It's very large , for example ,24,32, etc. , Its efficiency is reduced , Because the model needs to evenly allocate the data to several sub threads for preprocessing , Distribution and other data operations , If it is set higher, it will affect the efficiency . Of course , The number of threads is set to 1, It's a single CPU To preprocess and transmit data to GPU, It's going to be inefficient . secondly , When your server or computer has a large memory , When the performance is good , Suggest opening pin_memory open , It eliminates the need to transfer data from CPU Pass in to the cache RAM Inside , And then send it to GPU On ; by True Is directly mapped to GPU On the relevant memory block of , Save a little data transmission time .

原网站

版权声明
本文为[Wait for Godot.]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/03/202203010554184172.html

当前位置：网站首页>Explanation and explanation on the situation that the volume GPU util (GPU utilization) is very low and the memory ueage (memory occupation) is very high during the training of pytoch

Explanation and explanation on the situation that the volume GPU util (GPU utilization) is very low and the memory ueage (memory occupation) is very high during the training of pytoch

of Pytorch During training GPU Utilization is very low , In the case of a high proportion of memory

Preface

of GPU Of Memory-usage The occupation of （GPU Memory occupancy ）

of Volatile GPU-Utile Utilization ratio （GPU Utilization ratio ）

边栏推荐

猜你喜欢

随机推荐