当前位置：网站首页>Libtorch video memory management example

Libtorch video memory management example

2022-06-21 21:17:00 【Ten thousand miles' journey to】

In the use of libtorch When deploying , There will be insufficient video memory . Therefore, it is necessary to manage the utilization of video memory , This study libtorch Of api, Try video memory management .libtorch When you run the program , The occupation of video memory can be divided into 3 block ： Model parameters occupy video memory 、 Input and output tensor Occupied video memory 、 Model forword Process temporary variables occupy video memory .

Use cudaFree(tensor.data_ptr()) It can release tensor Occupied video memory , You can also use this function to release the memory occupied by the model parameters . Use CUDACachingAllocator::emptyCache Function can release the model in forword Some video memory in the process .

To explore the use and management of video memory in each stage of model deployment , Perform the following code tests .

1、 Environment configuration

When performing video memory management , Need configuration cuda,cuda Refer to figure for configuration 1. Besides , It also needs to be in the linker -》 Input -》 In additional dependencies , To configure cudnn.lib;cublas.lib;cudart.lib;.

libtorch The configuration of can refer to pytorch 4 libtorch Configure usage record （ Support cuda Call to ）_ A flash of hope to my blog -CSDN Blog

2、 Library import and basic function implementation

The following code mainly implements cuda and libtorch Import of , and cuda Functions that use queries

#include <torch/script.h>
#include <torch/torch.h>
#include <c10/cuda/CUDAStream.h>
#include <ATen/cuda/CUDAEvent.h>

#include <iostream>
#include <memory>
#include <string>

#include <cuda_runtime_api.h>
using namespace std;

static void print_cuda_use( )
{
    size_t free_byte;
    size_t total_byte;

    cudaError_t cuda_status = cudaMemGetInfo(&free_byte, &total_byte);

    if (cudaSuccess != cuda_status) {
        printf("Error: cudaMemGetInfo fails, %s \n", cudaGetErrorString(cuda_status));
        exit(1);
    }

    double free_db = (double)free_byte;
    double total_db = (double)total_byte;
    double used_db_1 = (total_db - free_db) / 1024.0 / 1024.0;
    std::cout << "Now used GPU memory " << used_db_1 << "  MB\n";
}

3、libtorch Process memory management

In code d_in_out.pt Reference from , But slightly different （ More model parameters , The blogger added the parameters of the model 100 times , In fact, it is to put every kernel Of filer_num Increasing the ）pytorch 6 libtorch Deploy the multiple input and output model （ Support batch）_ A flash of hope to my blog -CSDN Blog _pytorch Multiple input

The comments in the code specify whether each operation is effective and its impact

int main() {
    string path = "d_in_out.pt";
    // Setting does not require storing gradient information 
    at::NoGradGuard nograd;
    int gpu_id = 0;
    // Load model 
    torch::jit::Module model = torch::jit::load(path);
    model.to(at::kCUDA);
    model.eval();// Set the evaluation mode 

    std::cout << " Memory usage after loading the model \n";
    print_cuda_use();

    // Build dual input data 
    // For a single input model, only push_back once 
    at::Tensor x1_tensor = torch::ones({ 1,3,512,512 }).to(at::kCUDA);
    at::Tensor x2_tensor = torch::ones({ 1,3,512,512 }).to(at::kCUDA);
    at::Tensor result1,result2;
    std::cout << "\n initialization tensor The display memory of \n";
    print_cuda_use();

    std::cout << "\n Cycling 5 Time \n";
    for (int i = 0;i < 5;i++) {
        //result=model.forward({ x1_tensor,x2_tensor }).toTensor();//one out
        auto out = model.forward({ x1_tensor,x2_tensor });
        auto tpl = out.toTuple();//out.toTensorList();
        result1 = tpl->elements()[0].toTensor();
        result2 = tpl->elements()[1].toTensor();
        print_cuda_use();
    }

    std::cout << "\n Release tensor Occupied video memory （ It doesn't help much ）\n";
    cudaFree(x1_tensor.data_ptr());
    cudaFree(x1_tensor.data_ptr());
    cudaFree(result1.data_ptr());
    cudaFree(result2.data_ptr());
    print_cuda_use();

    std::cout << "\nCUDACachingAllocator::emptyCache （ A little bit of an effect ）\n";
    c10::cuda::CUDACachingAllocator::emptyCache();
    print_cuda_use();

    std::cout << "\n Release the memory occupied by model parameters （ meaningless ）\n";
    for (auto p : model.parameters())
        cudaFree(p.data_ptr());// You can't use cuda, Lead to model.to(at::kCUDA); Report errors 
    print_cuda_use();

    //torch::jit::Module::Module::freeze(model);
    std::cout << "\n Call the destructor of the model （ Invalid ）\n";
    model.~Module();// It doesn't really help with video memory changes 
    print_cuda_use();


    std::cout << "\n Reset cuda state （ Invalid ）\n";
    c10::cuda::CUDACachingAllocator::init(gpu_id);
    c10::cuda::CUDACachingAllocator::resetAccumulatedStats(gpu_id);// It doesn't really help with video memory changes 
    c10::cuda::CUDACachingAllocator::resetPeakStats(gpu_id);

    print_cuda_use();

    std::cout << "\ncudaDeviceReset（ It works , This will make subsequent models unusable cuda）\n";
    // Need configuration cuda
    cudaDeviceReset();// Complete release GPU resources   You can't use cuda, Lead to model.to(at::kCUDA); Report errors    Unless you can reinitialize libtorch Of cuda Environmental Science 
    print_cuda_use();

    torch::cuda::synchronize();
    model = torch::jit::load(path);
    model.to(at::kCUDA);
    print_cuda_use();

    return 0;
}

The execution result of the above code is shown in the figure below .