当前位置:网站首页>Explanation and explanation on the situation that the volume GPU util (GPU utilization) is very low and the memory ueage (memory occupation) is very high during the training of pytoch
Explanation and explanation on the situation that the volume GPU util (GPU utilization) is very low and the memory ueage (memory occupation) is very high during the training of pytoch
2022-06-12 07:53:00 【Wait for Godot.】
of Pytorch During training GPU Utilization is very low , In the case of a high proportion of memory
Preface
When the model starts training , Commonly used watch -n 0.1 nvidia-smi To observe GPU The proportion of video memory , As shown in the figure below , Usually GPU Ratio of video memory and GPU The utilization rate is very high , But some model training GPU utilization (Volatile GPU-util) The proportion is changing dynamically , from 0-100% Between constant floating cycles .
If this happens GPU Utilization is changing , You can further enter... From the command line Top Command to query CPU Utilization ratio , You can find the problem .
of GPU Of Memory-usage The occupation of (GPU Memory occupancy )
GPU in Memory-usage The most direct influencing factor is The size of the model and Batch size Size . Where model pair GPU in Memory-usage Factors include the network parameters ( The depth of the network , Width etc. ), Generally, the model structure has been fixed during training , Very few changes can be made easily . therefore , We are right. Memory-usage The impact of occupation is mainly regulated in Batch size Size , Such as batch size Set to 12,Memory-usage by 40%; And set to 24 comparison ,Memory-usage The memory usage is 80%, Close to the 2 Times relationship , The deviation is small . So when the model structure is fixed , As far as possible will batch size Set big , make the best of GPU Of memory .(GPU Will quickly calculate the data you give in , The main bottleneck of training time is CPU The data throughput of .)
of Volatile GPU-Utile Utilization ratio (GPU Utilization ratio )
This is Volatile GPU-Util Express , When not set CPU Number of threads , This parameter is beating repeatedly ,0%,20%,70%,95%,0%. Stop like this 1-2 Seconds and then repeat . It's actually GPU Waiting for data from CPU Transmit it , When transmitted from the bus to GPU after ,GPU Gradually come to count , Utilization will suddenly rise , however GPU It's very powerful ,0.5 The data can be processed in seconds , So the utilization rate will drop again , Wait for the next batch The introduction of . therefore , This GPU Utilization bottlenecks lie in memory bandwidth, memory media, and CPU Performance above . Of course, the best thing is to change to a better fourth generation or more powerful memory module , Better coordination CPU.
Another way is to , stay PyTorch In this framework , Data loading Dataloader Make changes and optimizations on , Include num_workers( Number of threads ),pin_memory=True, Will increase speed . Solve the bandwidth bottleneck and problem of data transmission GPU The problem of low computational efficiency . stay TensorFlow below , There are also settings for loading data
In order to improve utilization , First of all to num_workers( Number of threads ) Set appropriately ,4,8,16 Are several frequently selected parameters . I have tested , take num_workers It's very large , for example ,24,32, etc. , Its efficiency is reduced , Because the model needs to evenly allocate the data to several sub threads for preprocessing , Distribution and other data operations , If it is set higher, it will affect the efficiency . Of course , The number of threads is set to 1, It's a single CPU To preprocess and transmit data to GPU, It's going to be inefficient . secondly , When your server or computer has a large memory , When the performance is good , Suggest opening pin_memory open , It eliminates the need to transfer data from CPU Pass in to the cache RAM Inside , And then send it to GPU On ; by True Is directly mapped to GPU On the relevant memory block of , Save a little data transmission time .
边栏推荐
- The R language converts the data of the specified data column in the dataframe data from decimal to percentage representation, and the data to percentage
- Rich dad, poor dad Abstract
- 謀新局、促發展,桂林綠色數字經濟的頭雁效應
- Vs2019 MFC IP address control control inherits cipaddressctrl class redrawing
- Model deployment learning notes (I)
- R语言使用epiDisplay包的summ函数计算dataframe中指定变量在不同分组变量下的描述性统计汇总信息并可视化有序点图、使用dot.col参数设置不同分组数据点的颜色
- Parameter estimation of Weibull distribution
- In depth learning - overview of image classification related models
- Generalized semantic recognition based on semantic similarity
- Voice assistant -- Qu -- query error correction and rewriting
猜你喜欢

Voice assistant -- Architecture and design of Instruction Assistant

謀新局、促發展,桂林綠色數字經濟的頭雁效應

Voice assistant - Qu - single entity recall
![[tutorial] deployment process of yolov5 based on tensorflow Lite](/img/d0/c38f27ad76b62b27cdeb68728e9c8c.jpg)
[tutorial] deployment process of yolov5 based on tensorflow Lite

最新hbuilderX编辑uni-app项目运行于夜神模拟器

Voice assistant - those classification models used in the assistant

经典论文回顾:Palette-based Photo Recoloring

2021.11.3-7 scientific research log

Process terminated

Multithread decompression of tar
随机推荐
Improvement of hash function based on life game (continued 2)
2022 G3 boiler water treatment recurrent training question bank and answers
Topic 1 Single_ Cell_ analysis(1)
解决逆向工程Mapper重复问题
Leetcode notes: Weekly contest 295
Topic 1 Single_ Cell_ analysis(4)
20220526 yolov1-v5
Numerical calculation method chapter6 Iterative method for solving linear equations
Voice assistant -- Architecture and design of Instruction Assistant
2022r2 mobile pressure vessel filling test question simulation test platform operation
Question bank and answers of special operation certificate examination for safety management personnel of hazardous chemical business units in 2022
Some summaries of mathematical modeling competition in 2022
Model deployment learning notes (I)
VS2019 MFC IP Address Control 控件繼承CIPAddressCtrl類重繪
R语言使用epiDisplay包的summ函数计算dataframe中指定变量在不同分组变量下的描述性统计汇总信息并可视化有序点图、使用dot.col参数设置不同分组数据点的颜色
AJ project: online bank project summary
Voice assistant - those classification models used in the assistant
Seeking for a new situation and promoting development, the head goose effect of Guilin's green digital economy
LeetCode笔记:Weekly Contest 295
The R language uses the sample The split function divides the machine learning data set into training set and test set