当前位置:网站首页>Libtorch video memory management example
Libtorch video memory management example
2022-06-21 21:17:00 【Ten thousand miles' journey to】
In the use of libtorch When deploying , There will be insufficient video memory . Therefore, it is necessary to manage the utilization of video memory , This study libtorch Of api, Try video memory management .libtorch When you run the program , The occupation of video memory can be divided into 3 block : Model parameters occupy video memory 、 Input and output tensor Occupied video memory 、 Model forword Process temporary variables occupy video memory .
Use cudaFree(tensor.data_ptr()) It can release tensor Occupied video memory , You can also use this function to release the memory occupied by the model parameters . Use CUDACachingAllocator::emptyCache Function can release the model in forword Some video memory in the process .
To explore the use and management of video memory in each stage of model deployment , Perform the following code tests .
1、 Environment configuration
When performing video memory management , Need configuration cuda,cuda Refer to figure for configuration 1. Besides , It also needs to be in the linker -》 Input -》 In additional dependencies , To configure cudnn.lib;cublas.lib;cudart.lib;.

libtorch The configuration of can refer to pytorch 4 libtorch Configure usage record ( Support cuda Call to )_ A flash of hope to my blog -CSDN Blog
2、 Library import and basic function implementation
The following code mainly implements cuda and libtorch Import of , and cuda Functions that use queries
#include <torch/script.h>
#include <torch/torch.h>
#include <c10/cuda/CUDAStream.h>
#include <ATen/cuda/CUDAEvent.h>
#include <iostream>
#include <memory>
#include <string>
#include <cuda_runtime_api.h>
using namespace std;
static void print_cuda_use( )
{
size_t free_byte;
size_t total_byte;
cudaError_t cuda_status = cudaMemGetInfo(&free_byte, &total_byte);
if (cudaSuccess != cuda_status) {
printf("Error: cudaMemGetInfo fails, %s \n", cudaGetErrorString(cuda_status));
exit(1);
}
double free_db = (double)free_byte;
double total_db = (double)total_byte;
double used_db_1 = (total_db - free_db) / 1024.0 / 1024.0;
std::cout << "Now used GPU memory " << used_db_1 << " MB\n";
}3、libtorch Process memory management
In code d_in_out.pt Reference from , But slightly different ( More model parameters , The blogger added the parameters of the model 100 times , In fact, it is to put every kernel Of filer_num Increasing the )pytorch 6 libtorch Deploy the multiple input and output model ( Support batch)_ A flash of hope to my blog -CSDN Blog _pytorch Multiple input
The comments in the code specify whether each operation is effective and its impact
int main() {
string path = "d_in_out.pt";
// Setting does not require storing gradient information
at::NoGradGuard nograd;
int gpu_id = 0;
// Load model
torch::jit::Module model = torch::jit::load(path);
model.to(at::kCUDA);
model.eval();// Set the evaluation mode
std::cout << " Memory usage after loading the model \n";
print_cuda_use();
// Build dual input data
// For a single input model, only push_back once
at::Tensor x1_tensor = torch::ones({ 1,3,512,512 }).to(at::kCUDA);
at::Tensor x2_tensor = torch::ones({ 1,3,512,512 }).to(at::kCUDA);
at::Tensor result1,result2;
std::cout << "\n initialization tensor The display memory of \n";
print_cuda_use();
std::cout << "\n Cycling 5 Time \n";
for (int i = 0;i < 5;i++) {
//result=model.forward({ x1_tensor,x2_tensor }).toTensor();//one out
auto out = model.forward({ x1_tensor,x2_tensor });
auto tpl = out.toTuple();//out.toTensorList();
result1 = tpl->elements()[0].toTensor();
result2 = tpl->elements()[1].toTensor();
print_cuda_use();
}
std::cout << "\n Release tensor Occupied video memory ( It doesn't help much )\n";
cudaFree(x1_tensor.data_ptr());
cudaFree(x1_tensor.data_ptr());
cudaFree(result1.data_ptr());
cudaFree(result2.data_ptr());
print_cuda_use();
std::cout << "\nCUDACachingAllocator::emptyCache ( A little bit of an effect )\n";
c10::cuda::CUDACachingAllocator::emptyCache();
print_cuda_use();
std::cout << "\n Release the memory occupied by model parameters ( meaningless )\n";
for (auto p : model.parameters())
cudaFree(p.data_ptr());// You can't use cuda, Lead to model.to(at::kCUDA); Report errors
print_cuda_use();
//torch::jit::Module::Module::freeze(model);
std::cout << "\n Call the destructor of the model ( Invalid )\n";
model.~Module();// It doesn't really help with video memory changes
print_cuda_use();
std::cout << "\n Reset cuda state ( Invalid )\n";
c10::cuda::CUDACachingAllocator::init(gpu_id);
c10::cuda::CUDACachingAllocator::resetAccumulatedStats(gpu_id);// It doesn't really help with video memory changes
c10::cuda::CUDACachingAllocator::resetPeakStats(gpu_id);
print_cuda_use();
std::cout << "\ncudaDeviceReset( It works , This will make subsequent models unusable cuda)\n";
// Need configuration cuda
cudaDeviceReset();// Complete release GPU resources You can't use cuda, Lead to model.to(at::kCUDA); Report errors Unless you can reinitialize libtorch Of cuda Environmental Science
print_cuda_use();
torch::cuda::synchronize();
model = torch::jit::load(path);
model.to(at::kCUDA);
print_cuda_use();
return 0;
}The execution result of the above code is shown in the figure below .

边栏推荐
- Server body 17: simple understanding of memory mapping and shared memory
- Shutter automatickeepaliveclientmixin cache component
- 腾讯全球数字生态大会-高速智能计算专场!
- Kubernetes-23: explain how to make CPU manager more flexible
- Vertical and horizontal network shooting range community Modbus Protocol
- Que peut faire une ligne de code?
- Get the openharmony source code: get it from the deveco marketplace (1)
- What is more advantageous than domestic spot silver?
- LeeCode198 打家劫舍
- File compilation process
猜你喜欢

2016 ICLR | Adversarial Autoencoders

MySQl学习(从入门到精通 1.2)

New hybrid architecture iformer! Flexible migration of convolution and maximum pooling to transformer

What is redis hyperloglog? The use of these scenes makes me laugh like a dragon

Most detailed collation of vector basis of STL

Que peut faire une ligne de code?

Several common device communication protocols in embedded development are summarized

AXI_Bus_Matrix_4x4 设计 - 逻辑设计部分

Intersection du vecteur et du plan

ADUM1401ARWZ-RL 亚德诺 数字信号隔离模块
随机推荐
异步方法 理解(demo附代码)
Shutter tabbarview component
Highly scalable, emqx 5.0 achieves 100million mqtt connections
Shutter input box assembly
C语言回调函数到底是怎么回事?
Mysql database - storage engine
有哪些新手程序员不知道的小技巧?
AXI_ Bus_ Matrix_ 4x4 design - logic design
如何解决织梦文章列表自动更新点击次数
一行代码可以做什么?
有哪些新手程序員不知道的小技巧?
Some shaders in AB package do not trigger the callback of ipreprocessshaders
JVM的类加载过程
【服务器数据恢复】EMC某型号服务器raid5数据恢复案例
【MySQL·水滴计划】第三话- SQL的基本概念
期货开户平台哪家好?安全正规的期货公司有哪些?
【CTF】攻防世界 MISC
New hybrid architecture iformer! Flexible migration of convolution and maximum pooling to transformer
Is it true and safe for qiniu to open a securities account? Do you charge for opening an account
Welcome to the markdown editor