当前位置：网站首页>Faster-ILOD、maskrcnn_ Benchmark installation process and problems encountered

Faster-ILOD、maskrcnn_ Benchmark installation process and problems encountered

2022-07-02 07:34:00 【chenf0】

The paper

Faster ILOD: Incremental learning for object detectors based on faster RCNN 2020

The paper ：https://arxiv.org/abs/2003.03901
Code ：https://github.com/CanPeng123/Faster-ILOD

Code

One 、Requirements:

PyTorch 1.0 from a nightly release. It will not work with 1.0 nor
1.0.1. Installation instructions can be found in https://pytorch.org/get-started/locally/
torchvision from master
cocoapi
yacs
matplotlib
GCC >= 4.9
OpenCV
CUDA >= 9.0

Two 、 install Step-by-step installation

# first, make sure that your conda is setup properly with the right environment
# for that, check that `which conda`, `which pip` and `which python` points to the
# right path. From a clean conda env, this is what you need to do

conda create --name maskrcnn_benchmark -y
conda activate maskrcnn_benchmark

# this installs the right pip and dependencies for the fresh python
conda install ipython pip

# maskrcnn_benchmark and coco api dependencies
pip install ninja yacs cython matplotlib tqdm opencv-python

# follow PyTorch installation in https://pytorch.org/get-started/locally/
# we give the instructions for CUDA 9.0
conda install -c pytorch pytorch-nightly torchvision cudatoolkit=9.0

export INSTALL_DIR=$PWD

# install pycocotools
cd $INSTALL_DIR
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
python setup.py build_ext install

# install cityscapesScripts
cd $INSTALL_DIR
git clone https://github.com/mcordts/cityscapesScripts.git
cd cityscapesScripts/
python setup.py build_ext install

# install apex
cd $INSTALL_DIR
git clone https://github.com/NVIDIA/apex.git
cd apex
python setup.py install --cuda_ext --cpp_ext

# install PyTorch Detection
cd $INSTALL_DIR
git clone https://github.com/facebookresearch/maskrcnn-benchmark.git
cd maskrcnn-benchmark

# the following will install the lib with
# symbolic links, so that you can modify
# the files if you want and won't need to
# re-build it
python setup.py build develop


unset INSTALL_DIR

# or if you are on macOS
# MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py build develop

3、 ... and 、Faster-ILOD

take maskrcnn After the environment is installed , take Faster-ILOD The relevant code covers maskrcnn In the relevant folder , function python setup.py build develop recompile Or download it directly Faster-ILOD Code .

Four 、 function Faster-ILOD

With 15+5 For example ：

1. Modify dataset path

modify Faster-ILOD/maskrcnn_benchmark/config/paths_catalog.py find voc The corresponding path is modified to its own .
Insert picture description here

2. Modify the configuration file

/configs/e2e_faster_rcnn_R_50_C4_1x.yaml
Various parameters can be modified as required , This file will not be modified for the time being .
Insert picture description here

3. Training basic network

function python tools/train_first_step.py --config-file="./configs/e2e_faster_rcnn_R_50_C4_1x.yaml"
After successful operation, you can /home/incremental_learning_ResNet50_C4/RPN_15_classes_40k_steps View the output of the training in .

4. Incremental training

（1） modify e2e_faster_rcnn_R_50_C4_1x_Source_model.yaml and e2e_faster_rcnn_R_50_C4_1x_Target_model.yaml, Put the category in the file 、 New category 、 Old category , Modify the path of the final model and the output path after the training in the previous stage . function python tools/train_incremental.py Get the final training result in the corresponding output file .

e2e_faster_rcnn_R_50_C4_1x_Source_model.yaml
e2e_faster_rcnn_R_50_C4_1x_Target_model.yaml
tools/train_incremental.py

source_model_config_file = "/home/chenfang/maskrcnn-benchmark/configs/e2e_faster_rcnn_R_50_C4_1x_Source_model.yaml"
target_model_config_file = "/home/chenfang/maskrcnn-benchmark/configs/e2e_faster_rcnn_R_50_C4_1x_Target_model.yaml"

5、 ... and 、 Have a problem

1.git clone Because of network problems, I can't download it

Files can be downloaded locally , Upload to server ;
The installation package can also be downloaded locally , Upload to server ,pip install File path Installation

2.RuntimeError: Error compiling objects for extension

pytorch The version is not suitable
I am cuda10.1 pytorch1.7,1
After looking at the solution , take pytorch The version is reduced to 1.5 success

CUDA 10.1
Pytorch 1.4.0
torchvision 0.5.0

For more solutions, please refer to https://github.com/facebookresearch/maskrcnn-benchmark/issues/1236

3.RuntimeError: Output 0 of UnbindBackward is a view and its base or another view of its base has been modified inplace.

RuntimeError: Output 0 of UnbindBackward is a view and its base or another view of its base has been modified inplace. This view is the output of a function that returns multiple views. Such functions do not allow the output views to be modified inplace. You should replace the inplace operation by an out-of-place one.

Insert picture description here

Reference resources ： https://blog.csdn.net/Ginomica_xyx/article/details/120491859

Knowing the cause of the problem is right self.bbox Multiple modifications , The second modification ,python It is not clear that the operation is original self.bbox Or after modification self.bbox.
Know the problem , Try to solve the problem ： Modify the code take self.bbox Copy to a parameter Then operate on this parameter （ Can not be ）; Make a deep copy You can't either .
Check related questions , At the end of the day pytorch1.7.0 Of bug.
take pytorch The version is reduced to 1.6.0 This problem is solved

4.unable to execute ‘usr/local/cuda-10.0/bin/nvcc‘: No such file or directory

Reference resources ：
linux View and modify PATH Method of environment variable
https://blog.csdn.net/qq_41251963/article/details/110120386
https://blog.csdn.net/tailonh/article/details/120322932
https://blog.csdn.net/G_inkk/article/details/124584873

5. error: cannot call member function ‘void std::basic_string＜_CharT, _Traits, _Alloc＞::

python setup.py build develop Recompile is an error
RuntimeError: Error compiling objects for extension, The reason for the error is
/usr/include/c++/7/bits/basic_string.tcc:1067:16: error: cannot call member function ‘void std::basic_string<_CharT, _Traits, _Alloc>::_Rep
resolvent

Reference resources ：https://blog.csdn.net/weixin_45328592/article/details/114646355
https://blog.csdn.net/qq_29695701/article/details/118548238

sudo gedit /usr/include/c++/7/bits/basic_string.tcc

take

__p->_M_set_sharable()

Change it to

(*__p)._M_set_sharable()

that will do .
If there is a problem in modifying the file ：
‘readonly’ option is set (add ! to override)
The current user does not have permission , First sudo -i Switch to root Permission can be modified Use it directly sudo vim Open the file for modification

Reference resources
https://blog.csdn.net/cheng_feng_xiao_zhan/article/details/53391474

RuntimeError: Error compiling objects for extension
There may also be the following reasons ：

pytorch Version mismatch
cuda The problem of multi version switching

solve ： There is a colon in the error path , It indicates that there is a problem with the setting of environment variables

sudo vim ~/.bashrc
export CUDA_HOME=$CUDA_HOME:/usr/local/cuda
Change it to
export CUDA_HOME=/usr/local/cuda

source ~/.bashrc

Reference resources ：
https://blog.csdn.net/loovelj/article/details/110490986
https://www.codeleading.com/article/95735054818/
https://blog.csdn.net/zt1091574181/article/details/113611468

6.AsstributeError：‘tuple’ object has no attribute ‘values’

take loss_dict Change it to loss_dict[0]

7.RuntimeError: The size of tensor a (16) must match the size of tensor b (21) at non-singleton dimension 0

Incremental learning error , The problem should occur when loading basic training data , take optimizer Value changed to None that will do

checkpointer_target = DetectronCheckpointer(
cfg_target, model_target, optimizer=None, scheduler=scheduler,
save_dir=output_dir_target,save_to_disk=save_to_disk, logger=logger_target)

8. Incremental learning without training , Direct tests

It should be that the basic model has already run 40000, arguments_target[“iteration”] Directly for 40000, Our incremental training is still set to 40000, I think it's finished , Just train directly , Can be e2e_faster_rcnn_R_50_C4_1x_Target_model.yaml in MAX_ITER: 80000 # number of iteration Change it to 80000 That's all right.

ps: many cuda Switching between versions

stay /usr/local/ Check your installed... In the directory cuda edition

cd /usr/local 
ls

bin  cuda       cuda-10.2  etc    include  man   share
cud  cuda-10.1  cuda-11.0  games  lib      sbin  src

View the current cuda edition

nvcc  -V

Or use stat cuda View the current cuda Soft connection

  File: cuda -> /usr/local/cuda-10.1
  Size: 20              Blocks: 0          IO Block: 4096   symbolic link
Device: 812h/2066d      Inode: 2757665     Links: 1
Access: (0777/lrwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2022-06-06 21:34:32.342489356 +0800
Modify: 2022-05-22 15:11:26.498549390 +0800
Change: 2022-05-22 15:11:26.498549390 +0800
 Birth: -

If you want to change it to 10.2 edition , You need to delete the current link first , And reset it to 10.2, Just two lines of code

sudo rm -rf cuda
sudo ln -s /usr/local/cuda-10.2  /usr/local/cuda

Now check out cuda edition

nvcc -V

You can see that the version has been switched

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

原网站

版权声明
本文为[chenf0]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/183/202207020622519940.html