Detectron2 is FAIR's next-generation platform for object detection and segmentation.

Overview

Detectron2 is Facebook AI Research's next generation software system that implements state-of-the-art object detection algorithms. It is a ground-up rewrite of the previous version, Detectron, and it originates from maskrcnn-benchmark.

What's New

  • It is powered by the PyTorch deep learning framework.
  • Includes more features such as panoptic segmentation, Densepose, Cascade R-CNN, rotated bounding boxes, PointRend, DeepLab, etc.
  • Can be used as a library to support different projects on top of it. We'll open source more research projects in this way.
  • It trains much faster.
  • Models can be exported to TorchScript format or Caffe2 format for deployment.

See our blog post to see more demos and learn about detectron2.

Installation

See INSTALL.md.

Getting Started

Follow the installation instructions to install detectron2.

See Getting Started with Detectron2, and the Colab Notebook to learn about basic usage.

Learn more at our documentation. And see projects/ for some projects that are built on top of detectron2.

Model Zoo and Baselines

We provide a large set of baseline results and trained models available for download in the Detectron2 Model Zoo.

License

Detectron2 is released under the Apache 2.0 license.

Citing Detectron2

If you use Detectron2 in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.

@misc{wu2019detectron2,
  author =       {Yuxin Wu and Alexander Kirillov and Francisco Massa and
                  Wan-Yen Lo and Ross Girshick},
  title =        {Detectron2},
  howpublished = {\url{https://github.com/facebookresearch/detectron2}},
  year =         {2019}
}
Comments
  • Add support for ONNX-only and Caffe2 ONNX export

    Add support for ONNX-only and Caffe2 ONNX export

    Summary of changes

    This PR fixes both ONNX-only and Caffe2 ONNX exporters for the latest versions of this repo and PyTorch.

    For ONNX-only, the main issue is that add_export_config(cfg) is not exposed when Caffe2 is not compiled along with PyTorch, but for ONNX-only scenarios, such dependency is not needed. Therefore, add_export_config is moved from detectron2/export/api.py to detectron2/export/__init__.py

    A second contribution is a new test_export_onnx.py test file that export almost the same models as the test_export_tracing.py tests.

    For the Caffe2-ONNX, the main issue was a dependency on ONNX optimizer pass which is deprecated in newer ONNX versions. This PR removes such dependency because fuse_bn_into_conv optimization pass is already performed by torch.onnx.export anyway.

    Fixes https://github.com/facebookresearch/detectron2/issues/3488 Fixes https://github.com/pytorch/pytorch/issues/69674 (PyTorch repo)

    CLA Signed 
    opened by thiagocrepaldi 75
  • How do I compute validation loss during training?

    How do I compute validation loss during training?

    How do I compute validation loss during training?

    I'm trying to compute the loss on a validation dataset for each iteration during training. To do so, I've created my own hook:

    class ValidationLoss(detectron2.engine.HookBase):
        def __init__(self, config, dataset_name):
            super(ValidationLoss, self).__init__()
            self._loader = detectron2.data.build_detection_test_loader(config, dataset_name)
            
        def after_step(self):
            for batch in self._loader:
                loss = self.trainer.model(batch)
                log.debug(f"validation loss: {loss}")
    

    ... which I register with a DefaultTrainer. The hook code is called during training, but fails with the following:

    INFO:detectron2.engine.train_loop:Starting training from iteration 0
    ERROR:detectron2.engine.train_loop:Exception during training:
    Traceback (most recent call last):
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 133, in train
        self.after_step()
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 153, in after_step
        h.after_step()
      File "<ipython-input-6-63b308743b7d>", line 8, in after_step
        loss = self.trainer.model(batch)
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
        result = self.forward(*input, **kwargs)
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 123, in forward
        proposals, proposal_losses = self.proposal_generator(images, features, gt_instances)
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
        result = self.forward(*input, **kwargs)
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/detectron2/modeling/proposal_generator/rpn.py", line 164, in forward
        losses = {k: v * self.loss_weight for k, v in outputs.losses().items()}
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/detectron2/modeling/proposal_generator/rpn_outputs.py", line 322, in losses
        gt_objectness_logits, gt_anchor_deltas = self._get_ground_truth()
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/detectron2/modeling/proposal_generator/rpn_outputs.py", line 262, in _get_ground_truth
        for image_size_i, anchors_i, gt_boxes_i in zip(self.image_sizes, anchors, self.gt_boxes):
    TypeError: zip argument #3 must support iteration
    INFO:detectron2.engine.hooks:Total training time: 0:00:00 (0:00:00 on hooks)
    

    The traceback seems to imply that ground truth data is missing, which made me think that the data loader was the problem. However, switching to a training loader produces a different error:

    class ValidationLoss(detectron2.engine.HookBase):
        def __init__(self, config, dataset_name):
            super(ValidationLoss, self).__init__()
            self._loader = detectron2.data.build_detection_train_loader(config, dataset_name)
            
        def after_step(self):
            for batch in self._loader:
                loss = self.trainer.model(batch)
                log.debug(f"validation loss: {loss}")
    
    INFO:detectron2.engine.train_loop:Starting training from iteration 0
    ERROR:detectron2.engine.train_loop:Exception during training:
    Traceback (most recent call last):
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 133, in train
        self.after_step()
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 153, in after_step
        h.after_step()
      File "<ipython-input-6-e0d2c509cc72>", line 7, in after_step
        for batch in self._loader:
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/detectron2/data/common.py", line 109, in __iter__
        for d in self.dataset:
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
        data = self._next_data()
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
        return self._process_data(data)
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
        data.reraise()
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise
        raise self.exc_type(msg)
    TypeError: Caught TypeError in DataLoader worker process 0.
    Original Traceback (most recent call last):
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
        data = fetcher.fetch(index)
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
        data = [self.dataset[idx] for idx in possibly_batched_index]
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
        data = [self.dataset[idx] for idx in possibly_batched_index]
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/detectron2/data/common.py", line 39, in __getitem__
        data = self._map_func(self._dataset[cur_idx])
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/detectron2/utils/serialize.py", line 23, in __call__
        return self._obj(*args, **kwargs)
    TypeError: 'str' object is not callable
    
    INFO:detectron2.engine.hooks:Total training time: 0:00:00 (0:00:00 on hooks)
    

    As a sanity check, inference works just fine:

    class ValidationLoss(detectron2.engine.HookBase):
        def __init__(self, config, dataset_name):
            super(ValidationLoss, self).__init__()
            self._loader = detectron2.data.build_detection_test_loader(config, dataset_name)
            
        def after_step(self):
            for batch in self._loader:
                with detectron2.evaluation.inference_context(self.trainer.model):
                    loss = self.trainer.model(batch)
                    log.debug(f"validation loss: {loss}")
    
    INFO:detectron2.engine.train_loop:Starting training from iteration 0
    DEBUG:root:validation loss: [{'instances': Instances(num_instances=100, image_height=720, image_width=720, fields=[pred_boxes = Boxes(tensor([[4.4867e+02, 1.9488e+02, 5.1496e+02, 3.9878e+02],
            [4.2163e+02, 1.1204e+02, 6.1118e+02, 5.5378e+02],
            [8.7323e-01, 3.0374e+02, 9.2917e+01, 3.8698e+02],
            [4.3202e+02, 2.0296e+02, 5.7938e+02, 3.6817e+02],
            ...
    

    ... but that isn't what I want, of course. Any thoughts?

    Thanks in advance, Tim

    opened by tshead2 35
  • How to detect only one class (person) from the coco pre trained model

    How to detect only one class (person) from the coco pre trained model

    Can any one tell me how to select only one class which is 'person' in my case from coco data for instance segmentation. Pre trained model. (mask_rcnn_R_50_FPN_3x.yaml)

    I want to detect 'person' from the given image only.

    opened by anki92 34
  • onnx model exportable support

    onnx model exportable support

    Since onnx provides almost all ops needs by maskrcnn, it would be great if model can exported to onnx and would be benefit more from TensorRT acceleration for these large models.

    enhancement 
    opened by jinfagang 34
  • Improve documentation concerning the new config files

    Improve documentation concerning the new config files

    đź“š Documentation Improvements

    In short

    Concerning: https://detectron2.readthedocs.io/en/latest/tutorials/configs.html Problem: Documentation does not seem to have been updated to reflect the new config files (.py rather than .yaml) Solution: Update the documentation

    Problem description

    FAIR recently published new Mask R-CNN baselines and this was my first introduction to the new config file that no longer relies on YAML files but on 'raw' .py files. I am trying to load the new baselines using the config files mentioned in the MODEL_ZOO (see this table). For example:

    from detectron2 import model_zoo
    model = model_zoo.get("new_baselines/mask_rcnn_regnetx_4gf_dds_FPN_400ep_LSJ.py", trained=True)
    

    This gives

    RuntimeError: new_baselines/mask_rcnn_regnetx_4gf_dds_FPN_400ep_LSJ not available in Model Zoo!
    

    I have installed Detectron2 using the installation instructions. When looking up the documentation on configs, it seems that this has not been updated to reflect the new configs and still solely mentions YAML files.

    Proposed solution

    It could be that the CONFIG_PATH_TO_URL_SUFFIX dictionary in _ModelZooUrls class still has to be updated and that this is actually a bug (see here), but I find it hard to estimate wheter this is meant behavior (i.e. the new config file should be loaded differently) or a bug due to my limited understanding of the new config files. Either way, I therefore feel like the documentation on readthedocs should be updated to reflect the change from .yaml to .py.

    documentation 
    opened by orbiskcw 33
  • Add support for ONNX-only

    Add support for ONNX-only

    This PR is composed of different fixes to enable and end-to-end ONNX export functionality for detectron2 models

    • add_export_config API is publicly available exposed even when caffe2 is not compiled along with PyTorch (that is the new default behavior on latest PyTorch). A warning message informing users about its deprecation on future versions is also added

    • tensor.shape[0] replaces len(tensor) and for idx, img in enumerate(tensors) replaces for tmp_var1, tmp_var2 in zip(tensors, batched_imgs) so that the tracer does not lose reference to the user input on the graphs.

      • Before the changes above, the graph (see below) does not have an actual input. Instead, the input is exported as a model weight image
      • After the fix, the user images are properly acknowledged as model's input (see below) during ONNX export image
    • Added unit tests (tests/torch_export_onnx.py) for detectron2 models

    • ONNX is added as dependency for the CI to be able to run the aforementioned tests

    • Added custom symbolic functions to allow CI pipelines to succeed. The symbolics are needed because PyTorch 1.8, 1.9 and 1.10 adopted by detectron2 have several bugs. They can be removed when 1.11+ is adopted by detectron2's CI infra

    Fixes https://github.com/facebookresearch/detectron2/issues/3488 Fixes https://github.com/pytorch/pytorch/issues/69674 (PyTorch repo)

    CLA Signed 
    opened by thiagocrepaldi 32
  • Added docker compose file with useful tweaks.

    Added docker compose file with useful tweaks.

    From my perspective a docker-compose file has several benefits. On the one hand, it increases the comfort and on the other hand it is a way to supply users with useful tweaks. The commited docker-compose file addresses several issues and tweaks:

    1. It fixes potential problems with dataloaders (see #384).
    2. It includes Multi-GPU and performance tweaks as suggested by NVIDIA (see https://docs.nvidia.com/deeplearning/frameworks/user-guide/index.html#caffeovr).
    3. It adds GUI support (see #379).
    4. It enables caching of downloaded models (see #382).
    5. It makes docker run with the UID of the host user.

    Of course, all this can be accomplished with a long docker command. However, a docker-compose file give a central place to gather all recommendations for running detectron2 with docker, without bloating the Dockerfile with comments.

    CLA Signed 
    opened by maxfrei750 32
  • Properly convert a Detectron2  model to ONNX for Deployment

    Properly convert a Detectron2 model to ONNX for Deployment

    Hello,

    I am trying to convert a Detectron2 model to ONNX format and make inference without use detectron2 dependence in inference stage.

    Even is possible to find some information about that here : https://detectron2.readthedocs.io/en/latest/tutorials/deployment.html The implementation of this task is constantly being updated and the information found in this documentation is not clear enough to carry out this task .

    Some one can help me with some Demo/Tutorial of how make it ?

    @thiagocrepaldi

    Some information:

    My model was trained using pre-trained weight from:

    'faster_rcnn_50': { 'model_path': 'COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml', 'weights_path': 'model_final_280758.pkl' },

    I have 4 classes.

    Of course now i have my our weight. My model was saved in .pth forrmat.

    I used my our dataset, with image ( .png )

    Code in Python

    documentation 
    opened by vitorbds 28
  • How to apply mask_rcnn segmentation on a balloon video ?

    How to apply mask_rcnn segmentation on a balloon video ?

    Hi, I am going through the google colab example tutorial.

    I am trying to apply mask_rcnn segmentation on a random youtube balloon-video instead of an balloon-image to detect balloon only (one class).

    How can I assign .yaml and .pkl files that were generated using images earlier in the tutorial to a random video? thanks

    I tried the foolowing but it didn't work. I think I am having trouble assign the trained config and model files.

    !cd detectron2_repo && python demo/demo.py --config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml --video-input ../video-clip_b.mp4 --confidence-threshold 0.6 --output ../video-clip_b_testing1.mkv \
      --opts MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl
    
    opened by gireeshkbogu 28
  • Add support for Caffe2 ONNX export

    Add support for Caffe2 ONNX export

    Currently all Caffe2 export tests (under tests/test_export_caffe2.py) fail because the latest onnx releases do not have onnx.optimizer submodule anymore (instead, a new module onnxoptimizer was created from it)

    However, fuse_bn_into_conv optimization previously implemented within onnx.optimizer is already performed by torch.onnx.export too ruing ONNX export. Therefore onnx.optimizer dependency can be safely removed from detectron2 code.

    Depends on pytorch/pytorch#75718 Fixes #3488 Fixes pytorch/pytorch#69674 (PyTorch repo)

    ps: Although Caffe2 support is/will be deprecated, this PR relies on the fact that contributions are welcome as stated at docs/tutorials/deployment.md

    CLA Signed 
    opened by thiagocrepaldi 27
  • AssertionError: Attribute 'thing_classes' in the metadata of 'coco_2017_train' cannot be set to a different value!

    AssertionError: Attribute 'thing_classes' in the metadata of 'coco_2017_train' cannot be set to a different value!

    âť“ Questions and Help

    Hi, I am trying to train on my dataset with just 4 classes. When I run it, I get an error as below: image The scripts are interlinked a lot and therefore a bit difficult to debug. How to resolve this?

    Thanks.

    opened by akshaygadipatil 24
  • Loading pre-trained model configuration from Python file

    Loading pre-trained model configuration from Python file

    đź“š Documentation Issue

    I'm struggling to load the pre-trained model defined by new_baselines/mask_rcnn_R_101_FPN_400ep_LSJ.py. I've found relevant documentation here, here and issue #3225. However none of these clearly elucidate my error.

    I'm trying to load the configuration with:

    cfg = LazyConfig.load("detectron2/configs/new_baselines/mask_rcnn_R_101_FPN_400ep_LSJ.py")
    cfg = setup_cfg(args)
    

    This produces the following traceback:

    Traceback (most recent call last):
      File "quality_test.py", line 97, in <module>
        results_ls = get_person_seg_masks(img_path, model_family, model)
      File "detectron2_wrapper.py", line 107, in get_person_seg_masks
        cfg = setup_cfg(args)
      File "detectron2/demo/demo.py", line 29, in setup_cfg
        cfg.merge_from_file(args.config_file)
      File "/home/appuser/detectron2_repo/detectron2/config/config.py", line 46, in merge_from_file
        loaded_cfg = self.load_yaml_with_base(cfg_filename, allow_unsafe=allow_unsafe)
      File "/home/appuser/.local/lib/python3.8/site-packages/fvcore/common/config.py", line 61, in load_yaml_with_base
        cfg = yaml.safe_load(f)
      File "/home/appuser/.local/lib/python3.8/site-packages/yaml/__init__.py", line 125, in safe_load
        return load(stream, SafeLoader)
      File "/home/appuser/.local/lib/python3.8/site-packages/yaml/__init__.py", line 81, in load
        return loader.get_single_data()
      File "/home/appuser/.local/lib/python3.8/site-packages/yaml/constructor.py", line 49, in get_single_data
        node = self.get_single_node()
      File "/home/appuser/.local/lib/python3.8/site-packages/yaml/composer.py", line 39, in get_single_node
        if not self.check_event(StreamEndEvent):
      File "/home/appuser/.local/lib/python3.8/site-packages/yaml/parser.py", line 98, in check_event
        self.current_event = self.state()
      File "/home/appuser/.local/lib/python3.8/site-packages/yaml/parser.py", line 171, in parse_document_start
        raise ParserError(None, None,
    yaml.parser.ParserError: expected '<document start>', but found '<scalar>'
      in "detectron2/configs/new_baselines/mask_rcnn_R_101_FPN_400ep_LSJ.py", line 11, column 1
    
    documentation 
    opened by buckeye17 2
  • a problem with Deeplab for visualizing semantic segmentation

    a problem with Deeplab for visualizing semantic segmentation

    I am trying to implement semantic segmentation on Google Colab by the instructions of the Deeplab project of Detectron2 but when I want to visualize the segments on an image, I face a problem that I cannot solve it.

    ** "Instructions To Reproduce the Issue and Full Logs":** `!pip install pyyaml==5.1 !pip install exif==1.3.5 !pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html !git clone --branch v0.6 https://github.com/facebookresearch/detectron2.git detectron2_repo !pip install -e detectron2_repo import detectron2 from detectron2.utils.logger import setup_logger setup_logger() import numpy as np import cv2 import torch from google.colab.patches import cv2_imshow from detectron2 import model_zoo from detectron2.engine import DefaultPredictor from detectron2.config import get_cfg from detectron2.utils.visualizer import Visualizer, ColorMode from detectron2.data import MetadataCatalog coco_metadata = MetadataCatalog.get("coco_2017_val") from detectron2.projects import point_rend from detectron2.projects import deeplab from detectron2.projects.deeplab import add_deeplab_config !pip install 'git+https://github.com/facebookresearch/[email protected]' im=cv2.imread("./aachen_000005_000019_leftImg8bit.png") cv2_imshow(im) from detectron2.projects.deeplab.build_solver import build_lr_scheduler from detectron2 import checkpoint from fvcore.common.checkpoint import Checkpointer cfg = get_cfg() deeplab.add_deeplab_config(cfg) cfg.load_yaml_with_base("detectron2_repo/projects/DeepLab/configs/Cityscapes-SemanticSegmentation/deeplab_v3_plus_R_103_os16_mg124_poly_90k_bs16.yaml") cfg.merge_from_file("detectron2_repo/projects/DeepLab/configs/Cityscapes-SemanticSegmentation/deeplab_v3_plus_R_103_os16_mg124_poly_90k_bs16.yaml") cfg.MODEL.WEIGHTS = "https://dl.fbaipublicfiles.com/detectron2/DeepLab/Cityscapes-SemanticSegmentation/deeplab_v3_plus_R_103_os16_mg124_poly_90k_bs16/28054032/model_final_a8a355.pkl" predictor = DefaultPredictor(cfg) outputs = predictor(im)

    viz1=Visualizer(im[:,:,::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.0, instance_mode=ColorMode.SEGMENTATION)

    output=viz1.draw_sem_seg(outputs["sem_seg"].to("cpu"))

    image2 = output.get_image()[:,:,::-1] cv2_imshow(image2)`

    The error that I faced is:


    TypeError Traceback (most recent call last) in 1 viz1=Visualizer(im[:,:,::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.0, instance_mode=ColorMode.SEGMENTATION) ----> 2 output=viz1.draw_sem_seg(outputs["sem_seg"].to("cpu"))

    /content/detectron2_repo/detectron2/utils/visualizer.py in draw_sem_seg(self, sem_seg, area_threshold, alpha) 449 if isinstance(sem_seg, torch.Tensor): 450 sem_seg = sem_seg.numpy() --> 451 labels, areas = np.unique(sem_seg, return_counts=True) 452 sorted_idxs = np.argsort(-areas).tolist() 453 labels = labels[sorted_idxs]

    TypeError: list indices must be integers or slices, not numpy.float32

    Expected behavior: I expected that I could draw semantic segmentation on the image.

    Environment

    2023-01-03 21:56:28 URL:https://raw.githubusercontent.com/facebookresearch/detectron2/main/detectron2/utils/collect_env.py [8391/8391] -> "collect_env.py" [1]


    sys.platform linux Python 3.8.16 (default, Dec 7 2022, 01:12:13) [GCC 7.5.0] numpy 1.21.6 detectron2 failed to import detectron2._C not built correctly: No module named 'detectron2' Compiler ($CXX) c++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 CUDA compiler Build cuda_11.2.r11.2/compiler.29618528_0 DETECTRON2_ENV_MODULE PyTorch 1.13.0+cu116 @/usr/local/lib/python3.8/dist-packages/torch PyTorch debug build False GPU available Yes GPU 0 Tesla T4 (arch=7.5) Driver version 460.32.03 CUDA_HOME /usr/local/cuda Pillow 7.1.2 torchvision 0.14.0+cu116 @/usr/local/lib/python3.8/dist-packages/torchvision torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6 cv2 4.6.0


    PyTorch built with:

    • GCC 9.3
    • C++ Version: 201402
    • Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
    • Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
    • OpenMP 201511 (a.k.a. OpenMP 4.5)
    • LAPACK is enabled (usually provided by MKL)
    • NNPACK is enabled
    • CPU capability usage: AVX2
    • CUDA Runtime 11.6
    • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
    • CuDNN 8.3.2 (built against CUDA 11.5)
    • Magma 2.6.1
    • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
    opened by SAAZI4 1
  • I'm trying to train custom keypoint detection model and running into some errors

    I'm trying to train custom keypoint detection model and running into some errors

    This is my program so far: `register_coco_instances("mesh_train", {}, "../mesh_coco_Train.json", "../Data/Train/Images") register_coco_instances("mesh_test", {}, "../mesh_coco_Test.json", "../Data/Test/Images")

    MetadataCatalog.get("mesh_train").keypoint_names = ["joints"] MetadataCatalog.get("mesh_train").keypoint_flip_map = [] train_dicts = DatasetCatalog.get("mesh_train") test_dicts = DatasetCatalog.get("mesh_test") mesh_metadata = MetadataCatalog.get("mesh_train")

    def cv2_imshow(im): im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) plt.figure(), plt.imshow(im), plt.axis('off') plt.show()

    for d in random.sample(train_dicts, 5): print(d["file_name"]) img = cv2.imread(d["file_name"]) visualizer = Visualizer(img[:, :, ::-1], metadata=mesh_metadata, scale=0.5) vis = visualizer.draw_dataset_dict(d) cv2_imshow(vis.get_image()[:, :, ::-1])

    cfg = get_cfg() cfg.merge_from_file(model_zoo.get_config_file("COCO-Keypoints/keypoint_rcnn_R_50_FPN_3x.yaml")) cfg.DATASETS.TRAIN = ("mesh_train",) cfg.DATASETS.TEST = ("mesh_test",) cfg.DATALOADER.NUM_WORKERS = 4 cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Keypoints/keypoint_rcnn_R_50_FPN_3x.yaml") cfg.SOLVER.IMS_PER_BATCH = 2 cfg.SOLVER.BASE_LR = 0.001 cfg.SOLVER.MAX_ITER = 300 cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128 cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1

    os.makedirs(cfg.OUTPUT_DIR, exist_ok=True) trainer = DefaultTrainer(cfg) trainer.resume_or_load(resume=False)

    print("training....") trainer.train()`

    But when I run it, I'm getting the following error:

    [01/02 13:47:51 d2.engine.train_loop]: Starting training from iteration 0 ERROR [01/02 13:47:51 d2.engine.train_loop]: Exception during training: Traceback (most recent call last): File "C:\Users\Singh Automation\AppData\Local\Programs\Python\Python310\lib\site-packages\detectron2\engine\train_loop.py", line 149, in train self.run_step() File "C:\Users\Singh Automation\AppData\Local\Programs\Python\Python310\lib\site-packages\detectron2\engine\defaults.py", line 494, in run_step self._trainer.run_step() File "C:\Users\Singh Automation\AppData\Local\Programs\Python\Python310\lib\site-packages\detectron2\engine\train_loop.py", line 268, in run_step data = next(self._data_loader_iter) File "C:\Users\Singh Automation\AppData\Local\Programs\Python\Python310\lib\site-packages\detectron2\data\common.py", line 283, in iter for d in self.dataset: File "C:\Users\Singh Automation\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 435, in iter return self._get_iterator() File "C:\Users\Singh Automation\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 381, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "C:\Users\Singh Automation\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 1034, in init w.start() File "C:\Users\Singh Automation\AppData\Local\Programs\Python\Python310\lib\multiprocessing\process.py", line 121, in start self._popen = self._Popen(self) File "C:\Users\Singh Automation\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 224, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "C:\Users\Singh Automation\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 336, in _Popen return Popen(process_obj) File "C:\Users\Singh Automation\AppData\Local\Programs\Python\Python310\lib\multiprocessing\popen_spawn_win32.py", line 45, in init prep_data = spawn.get_preparation_data(process_obj._name) File "C:\Users\Singh Automation\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 154, in get_preparation_data _check_not_importing_main() File "C:\Users\Singh Automation\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main raise RuntimeError(''' RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:
    
            if __name__ == '__main__':
                freeze_support()
                ...
    
        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
    

    [01/02 13:47:51 d2.engine.hooks]: Total training time: 0:00:00 (0:00:00 on hooks) [01/02 13:47:51 d2.utils.events]: iter: 0 lr: N/A max_mem: 228M

    opened by Codeveen 1
  • [Multi node Training] Training time is very longer than a single node

    [Multi node Training] Training time is very longer than a single node

    Hello

    There is a problem that the training time is very slow when learning the model with detectron2 using two machines

    I use A6000 RTX with 4 GPUs per node and train my models with the two nodes. Two nodes are on Ubuntu 20.04. Training is normally working and the log.txt file is also generated well.

    I set the environment variables as follows

    Node1 setting(189) export NCCL_DEBUG="INFO" export NCCL_SOCKET_IFNAME="enp36s0f1" export GLOO_SOCKET_IFNAME="enp36s0f1"

    Node2 setting export NCCL_DEBUG="INFO" export NCCL_SOCKET_IFNAME="enp4s0" export GLOO_SOCKET_IFNAME="enp4s0"

    First, when I only set NCCL environment variables (not set GLOO), I got these errors

    -- Process 0 terminated with the following error:
    Traceback (most recent call last):
    File "/opt/conda/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
    File "/root/xgy/experiments/detectron2/detectron2/engine/launch.py", line 125, in _distributed_worker
    main_func(*args)
    File "/root/xgy/experiments/distributed-pytorch/MaskRCNN/train/train_net.py", line 141, in main
    trainer = Trainer(cfg)
    File "/root/xgy/experiments/detectron2/detectron2/engine/defaults.py", line 383, in init
    data_loader = self.build_train_loader(cfg)
    File "/root/xgy/experiments/detectron2/detectron2/engine/defaults.py", line 543, in build_train_loader
    return build_detection_train_loader(cfg)
    File "/root/xgy/experiments/detectron2/detectron2/config/config.py", line 192, in wrapped
    explicit_args = _get_args_from_config(from_config, *args, **kwargs)
    File "/root/xgy/experiments/detectron2/detectron2/config/config.py", line 229, in _get_args_from_config
    ret = from_config_func(*args, **kwargs)
    File "/root/xgy/experiments/detectron2/detectron2/data/build.py", line 328, in _train_loader_from_config
    sampler = TrainingSampler(len(dataset))
    File "/root/xgy/experiments/detectron2/detectron2/data/samplers/distributed_sampler.py", line 37, in init
    seed = comm.shared_random_seed()
    File "/root/xgy/experiments/detectron2/detectron2/utils/comm.py", line 230, in shared_random_seed
    all_ints = all_gather(ints)
    File "/root/xgy/experiments/detectron2/detectron2/utils/comm.py", line 154, in all_gather
    group = _get_global_gloo_group()
    File "/root/xgy/experiments/detectron2/detectron2/utils/comm.py", line 89, in _get_global_gloo_group
    return dist.new_group(backend="gloo")
    File "/opt/conda/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 2019, in new_group
    pg = _new_process_group_helper(group_world_size,
    File "/opt/conda/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 504, in _new_process_group_helper
    pg = ProcessGroupGloo(
    RuntimeError: [/pytorch/third_party/gloo/gloo/transport/tcp/pair.cc:769] connect [127.0.0.1]:7602: Connection refused
    

    After I set export GLOO_SOCKET_IFNAME="enp4s0" and export GLOO_SOCKET_IFNAME="enp36s0f1" respectively, The training worked but the time is too slow. This is my NCCL BUG Report

    cvlab189-System-Product-Name:1379562:1379562 [0] NCCL INFO Bootstrap : Using enp36s0f1:168.188.129.189<0>
    cvlab189-System-Product-Name:1379562:1379562 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
    
    cvlab189-System-Product-Name:1379562:1379562 [0] misc/ibvwrap.cc:63 NCCL WARN Failed to open libibverbs.so[.1]
    cvlab189-System-Product-Name:1379562:1379562 [0] NCCL INFO NET/Socket : Using [0]enp36s0f1:168.188.129.189<0>
    cvlab189-System-Product-Name:1379562:1379562 [0] NCCL INFO Using network Socket
    NCCL version 2.10.3+cuda11.3
    cvlab189-System-Product-Name:1379564:1379564 [2] NCCL INFO Bootstrap : Using enp36s0f1:168.188.129.189<0>
    cvlab189-System-Product-Name:1379564:1379564 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
    
    cvlab189-System-Product-Name:1379564:1379564 [2] misc/ibvwrap.cc:63 NCCL WARN Failed to open libibverbs.so[.1]
    cvlab189-System-Product-Name:1379564:1379564 [2] NCCL INFO NET/Socket : Using [0]enp36s0f1:168.188.129.189<0>
    cvlab189-System-Product-Name:1379564:1379564 [2] NCCL INFO Using network Socket
    cvlab189-System-Product-Name:1379565:1379565 [3] NCCL INFO Bootstrap : Using enp36s0f1:168.188.129.189<0>
    cvlab189-System-Product-Name:1379565:1379565 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
    
    cvlab189-System-Product-Name:1379565:1379565 [3] misc/ibvwrap.cc:63 NCCL WARN Failed to open libibverbs.so[.1]
    cvlab189-System-Product-Name:1379563:1379563 [1] NCCL INFO Bootstrap : Using enp36s0f1:168.188.129.189<0>
    cvlab189-System-Product-Name:1379565:1379565 [3] NCCL INFO NET/Socket : Using [0]enp36s0f1:168.188.129.189<0>
    cvlab189-System-Product-Name:1379565:1379565 [3] NCCL INFO Using network Socket
    cvlab189-System-Product-Name:1379563:1379563 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
    
    cvlab189-System-Product-Name:1379563:1379563 [1] misc/ibvwrap.cc:63 NCCL WARN Failed to open libibverbs.so[.1]
    cvlab189-System-Product-Name:1379563:1379563 [1] NCCL INFO NET/Socket : Using [0]enp36s0f1:168.188.129.189<0>
    cvlab189-System-Product-Name:1379563:1379563 [1] NCCL INFO Using network Socket
    cvlab189-System-Product-Name:1379565:1379737 [3] NCCL INFO Trees [0] -1/-1/-1->3->2 [1] -1/-1/-1->3->2
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO Channel 00/02 :    0   1   2   3   4   5   6   7
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO Channel 01/02 :    0   1   2   3   4   5   6   7
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO Trees [0] 1/4/-1->0->-1 [1] 1/-1/-1->0->4
    cvlab189-System-Product-Name:1379563:1379738 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
    cvlab189-System-Product-Name:1379564:1379736 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1
    cvlab189-System-Product-Name:1379564:1379736 [2] NCCL INFO Channel 00 : 2[41000] -> 3[61000] via P2P/IPC
    cvlab189-System-Product-Name:1379563:1379738 [1] NCCL INFO Channel 00 : 1[2c000] -> 2[41000] via P2P/IPC
    cvlab189-System-Product-Name:1379564:1379736 [2] NCCL INFO Channel 01 : 2[41000] -> 3[61000] via P2P/IPC
    cvlab189-System-Product-Name:1379563:1379738 [1] NCCL INFO Channel 01 : 1[2c000] -> 2[41000] via P2P/IPC
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO Channel 00 : 7[68000] -> 0[1000] [receive] via NET/Socket/0
    cvlab189-System-Product-Name:1379565:1379737 [3] NCCL INFO Channel 00 : 3[61000] -> 4[19000] [send] via NET/Socket/0
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO Channel 01 : 7[68000] -> 0[1000] [receive] via NET/Socket/0
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO Channel 00 : 0[1000] -> 1[2c000] via P2P/IPC
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO Channel 01 : 0[1000] -> 1[2c000] via P2P/IPC
    cvlab189-System-Product-Name:1379565:1379737 [3] NCCL INFO Channel 01 : 3[61000] -> 4[19000] [send] via NET/Socket/0
    cvlab189-System-Product-Name:1379565:1379737 [3] NCCL INFO Connected all rings
    cvlab189-System-Product-Name:1379565:1379737 [3] NCCL INFO Channel 00 : 3[61000] -> 2[41000] via P2P/IPC
    cvlab189-System-Product-Name:1379565:1379737 [3] NCCL INFO Channel 01 : 3[61000] -> 2[41000] via P2P/IPC
    cvlab189-System-Product-Name:1379564:1379736 [2] NCCL INFO Connected all rings
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO Connected all rings
    cvlab189-System-Product-Name:1379563:1379738 [1] NCCL INFO Connected all rings
    cvlab189-System-Product-Name:1379564:1379736 [2] NCCL INFO Channel 00 : 2[41000] -> 1[2c000] via P2P/IPC
    cvlab189-System-Product-Name:1379563:1379738 [1] NCCL INFO Channel 00 : 1[2c000] -> 0[1000] via P2P/IPC
    cvlab189-System-Product-Name:1379564:1379736 [2] NCCL INFO Channel 01 : 2[41000] -> 1[2c000] via P2P/IPC
    cvlab189-System-Product-Name:1379563:1379738 [1] NCCL INFO Channel 01 : 1[2c000] -> 0[1000] via P2P/IPC
    cvlab189-System-Product-Name:1379565:1379737 [3] NCCL INFO Connected all trees
    cvlab189-System-Product-Name:1379565:1379737 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
    cvlab189-System-Product-Name:1379565:1379737 [3] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO Channel 00 : 4[19000] -> 0[1000] [receive] via NET/Socket/0
    cvlab189-System-Product-Name:1379564:1379736 [2] NCCL INFO Connected all trees
    cvlab189-System-Product-Name:1379564:1379736 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
    cvlab189-System-Product-Name:1379564:1379736 [2] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO Channel 01 : 4[19000] -> 0[1000] [receive] via NET/Socket/0
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO Channel 00 : 0[1000] -> 4[19000] [send] via NET/Socket/0
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO Channel 01 : 0[1000] -> 4[19000] [send] via NET/Socket/0
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO Connected all trees
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
    cvlab189-System-Product-Name:1379563:1379738 [1] NCCL INFO Connected all trees
    cvlab189-System-Product-Name:1379563:1379738 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
    cvlab189-System-Product-Name:1379563:1379738 [1] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO comm 0x7f7590002fb0 rank 0 nranks 8 cudaDev 0 busId 1000 - Init COMPLETE
    cvlab189-System-Product-Name:1379563:1379738 [1] NCCL INFO comm 0x7f3928002fb0 rank 1 nranks 8 cudaDev 1 busId 2c000 - Init COMPLETE
    cvlab189-System-Product-Name:1379565:1379737 [3] NCCL INFO comm 0x7f9f00002fb0 rank 3 nranks 8 cudaDev 3 busId 61000 - Init COMPLETE
    cvlab189-System-Product-Name:1379564:1379736 [2] NCCL INFO comm 0x7fe414002fb0 rank 2 nranks 8 cudaDev 2 busId 41000 - Init COMPLETE
    cvlab189-System-Product-Name:1379562:1379562 [0] NCCL INFO Launch mode Parallel
    

    For the record, according to this guide https://pytorch.org/docs/stable/distributed.html, "If you encounter any problem with NCCL, use Gloo as the fallback option. (Note that Gloo currently runs slower than NCCL for GPUs." distributed_sampler in detectron2 uses gloo backend.

    When I type this command python -c "import torch;print(torch.cuda.nccl.version())"(NCCL Version check in Conda virtual Enviroment) (2, 10, 3) for both two machines I additionally didn't install NCCL (only installed Pytorch) What should I do?

    opened by daebakk 1
  • A simple trick for a fully deterministic ROIAlign, and thus MaskRCNN training and inference

    A simple trick for a fully deterministic ROIAlign, and thus MaskRCNN training and inference

    Non-determinism of MaskRCNN

    There have been a lot of discussions and inquiries in this repo about a fully deterministic MaskRCNN e.g. #4260, #3203 , #2615, #2480, and also on other detection repositories (e.g. MMDetection here and here and also torchvision here). Unfortunately, even after seeding everything and setting Pytorch's deterministic flags, results are still non-repeatable.

    It boils down to the fact that some of the used Pytorch / torchvision ops doesn't have a deterministic GPU implementation (most notably, due to using atomicAdd in the backward pass). So, the only solution is to train for as long as possible to reduce variance in the results. It is worth noting that not only training, but also evaluation (see #2480) of MaskRCNN (and actually most detectron2 models) is not deterministic

    Based on the minimal example in #4260, I made an analysis on the ops used for MaskRCNN and found that the main reason of non-determinism is the backward pass of ROIAlign (see here).

    Proposed solution

    I am here proposing a simple trick that makes ROIAlign practically fully reproducible, without touching the cuda kernel!! it introduces trivial additional memory and computation. It can be summarized as:

    • Truncate the input to a smaller datatype, this gives a starting point with a very small number of significand bits used
    • Then, cast to a larger data-type just before doing the computations that involve atomicAdd

    In terms of code, this is translated to simply modifying this function call to

    return roi_align(
        input.half().double(),
        rois.half().double(),
        self.output_size,
        self.spatial_scale,
        self.sampling_ratio,
        self.aligned,
    ).to(dtype=input.dtype)
    

    Test

    The conversion to double results in a trivial increase in memory & computation, but performing it after the truncation, significantly increases reproducibility.

    This solution was tested and found fully deterministic (losses values, and evaluation results on COCO) upto tens of thousands of steps (using same code as in #4260) for:

    • MaskRCNN based on ResNet-50 bakbone
    • MaskRCNN based on ResNeXt-101 bakbone
    • Wide range of batch sizes
    • Mixed-precision training
    • Single and Multi-GPU training
    • A100's & V100's

    Note on A100

    Ampere by default uses TF32 format for tensor-core computations, which means that the above truncation is done implicitly! so on Ampere based devices it is enough just to cast to double, i.e.

    return roi_align(
        input.double(),
        rois.double(),
        self.output_size,
        self.spatial_scale,
        self.sampling_ratio,
        self.aligned,
    ).to(dtype=input.dtype)
    

    Note: This is the default mode for PyTorch, but if TF32 is disabled for some reason (i.e. torch.backends.cudnn.allow_tf32 = False) then the above truncation with .half() is still necessary

    Note

    • This solution was tested and found to work well for other non-deterministic Pytorch ops, including: F.interpolate and F.grid_sample
    • This is not a general solution to the problem of random-order reproducible floating point summation, but a practical mitigation that works well for this setup / scenario
    • At least in theory, this should work even better if applied inside the kernel right before atomicAdd
    • The only alternative currently is training each experiment for very long, which isn't practical in many setups, and still isn't fully reproducible

    Would love to hear what people think about this! @ppwwyyxx @fmassa

    enhancement 
    opened by ASDen 0
  • How to save all logs in a file?

    How to save all logs in a file?

    Hi,

    During the training, I see that only loss and epochs-related info is stored in log file but what if any error happens in the training then it does not save such errors in the log file.

    Can you please share if this feature already exists or is there any way to achieve that?

    enhancement 
    opened by dsbyprateekg 0
Releases(v0.6)
  • v0.6(Nov 15, 2021)

    Pre-built Linux binaries are available for the following environment:

    CUDA torch 1.10torch 1.9torch 1.8
    11.3
    install
    python -m pip install detectron2==0.6 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.10/index.html
    
    11.1
    install
    python -m pip install detectron2==0.6 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.10/index.html
    
    install
    python -m pip install detectron2==0.6 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.9/index.html
    
    install
    python -m pip install detectron2==0.6 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.8/index.html
    
    10.2
    install
    python -m pip install detectron2==0.6 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.10/index.html
    
    install
    python -m pip install detectron2==0.6 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.9/index.html
    
    install
    python -m pip install detectron2==0.6 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.8/index.html
    
    10.1
    install
    python -m pip install detectron2==0.6 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.8/index.html
    
    cpu
    install
    python -m pip install detectron2==0.6 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.10/index.html
    
    install
    python -m pip install detectron2==0.6 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.9/index.html
    
    install
    python -m pip install detectron2==0.6 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.8/index.html
    
    Source code(tar.gz)
    Source code(zip)
  • v0.5(Jul 23, 2021)

    Pre-built Linux binaries are available for the following environment:

    CUDA torch 1.9torch 1.8torch 1.7
    11.1
    install
    python -m pip install detectron2==0.5 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.9/index.html
    
    install
    python -m pip install detectron2==0.5 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.8/index.html
    
    11.0
    install
    python -m pip install detectron2==0.5 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu110/torch1.7/index.html
    
    10.2
    install
    python -m pip install detectron2==0.5 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.9/index.html
    
    install
    python -m pip install detectron2==0.5 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.8/index.html
    
    install
    python -m pip install detectron2==0.5 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.7/index.html
    
    10.1
    install
    python -m pip install detectron2==0.5 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.8/index.html
    
    install
    python -m pip install detectron2==0.5 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.7/index.html
    
    9.2
    install
    python -m pip install detectron2==0.5 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu92/torch1.7/index.html
    
    cpu
    install
    python -m pip install detectron2==0.5 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.9/index.html
    
    install
    python -m pip install detectron2==0.5 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.8/index.html
    
    install
    python -m pip install detectron2==0.5 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.7/index.html
    
    Source code(tar.gz)
    Source code(zip)
  • v0.4(Mar 13, 2021)

    New Features

    • All common models can be converted to TorchScript format by tracing or scripting (tutorial). Requires pytorch≥1.8.
    • Support fvcore parameter schedulers (originally from ClassyVision) that are composable, scale-invariant, and can be used on parameters other than learning rate.
    • Refactor PointRend as a mask head (instead of an ROIHead).
    • New export and C++ deployment examples.
    • Release d2go which provides end-to-end production pipeline.

    New Features in DensePose:

    Release DensePose CSE (a framework to extend DensePose to various categories using 3D models) and DensePose Evolution (a framework to bootstrap DensePose on unlabeled data). See here for more details.

    Deprecations:

    • Deprecate cfg argument from COCO/LVIS evaluator; Deprecate num_classes and ignore_label argument from SemSegEvaluator
    • Deprecate WarmupMultiStepLR, WarmupCosineLR in favor of fvcore schedulers
    • Deprecated features will be removed in future releases

    Pre-built Linux binaries are available for the following environment:

    CUDA torch 1.8torch 1.7torch 1.6
    11.1
    install
    python -m pip install detectron2==0.4 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.8/index.html
    
    11.0
    install
    python -m pip install detectron2==0.4 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu110/torch1.7/index.html
    
    10.2
    install
    python -m pip install detectron2==0.4 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.8/index.html
    
    install
    python -m pip install detectron2==0.4 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.7/index.html
    
    install
    python -m pip install detectron2==0.4 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.6/index.html
    
    10.1
    install
    python -m pip install detectron2==0.4 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.8/index.html
    
    install
    python -m pip install detectron2==0.4 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.7/index.html
    
    install
    python -m pip install detectron2==0.4 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.6/index.html
    
    9.2
    install
    python -m pip install detectron2==0.4 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu92/torch1.7/index.html
    
    install
    python -m pip install detectron2==0.4 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu92/torch1.6/index.html
    
    cpu
    install
    python -m pip install detectron2==0.4 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.8/index.html
    
    install
    python -m pip install detectron2==0.4 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.7/index.html
    
    install
    python -m pip install detectron2==0.4 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.6/index.html
    
    Source code(tar.gz)
    Source code(zip)
  • v0.3(Nov 6, 2020)

    Features & Improvements:

    • Support constructing RetinaNet, data loader, optimizer, COCOEvaluator without configs, in addition to Mask R-CNN.
    • Add DeepLab & PanopticDeepLab in projects/.
    • Support importing 3 projects (point_rend, deeplab, panoptic_deeplab) directly with import detectron2.projects.xxx.
    • Support mixed precision in training (using cfg.SOLVER.AMP.ENABLED) and inference.
    • Support ADE20k semantic segmentation dataset (named ade20k_sem_seg_train, ade20k_sem_seg_val).
    • Continuous build on Windows.

    Pre-built Linux binaries are provided for the following environment:

    CUDA torch 1.7torch 1.6torch 1.5
    11.0
    install
    python -m pip install detectron2==0.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu110/torch1.7/index.html
    
    10.2
    install
    python -m pip install detectron2==0.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.7/index.html
    
    install
    python -m pip install detectron2==0.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.6/index.html
    
    install
    python -m pip install detectron2==0.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.5/index.html
    
    10.1
    install
    python -m pip install detectron2==0.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.7/index.html
    
    install
    python -m pip install detectron2==0.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.6/index.html
    
    install
    python -m pip install detectron2==0.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.5/index.html
    
    9.2
    install
    python -m pip install detectron2==0.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu92/torch1.7/index.html
    
    install
    python -m pip install detectron2==0.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu92/torch1.6/index.html
    
    install
    python -m pip install detectron2==0.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu92/torch1.5/index.html
    
    cpu
    install
    python -m pip install detectron2==0.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.7/index.html
    
    install
    python -m pip install detectron2==0.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.6/index.html
    
    install
    python -m pip install detectron2==0.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.5/index.html
    
    Source code(tar.gz)
    Source code(zip)
  • v0.2.1(Aug 4, 2020)

    • Added pre-built binary for PyTorch 1.6
    CUDA torch 1.6torch 1.5torch 1.4
    10.2
    install
    python -m pip install detectron2==0.2.1 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.6/index.html
    
    install
    python -m pip install detectron2==0.2.1 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.5/index.html
    
    10.1
    install
    python -m pip install detectron2==0.2.1 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.6/index.html
    
    install
    python -m pip install detectron2==0.2.1 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.5/index.html
    
    install
    python -m pip install detectron2==0.2.1 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.4/index.html
    
    10.0
    install
    python -m pip install detectron2==0.2.1 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu100/torch1.4/index.html
    
    9.2
    install
    python -m pip install detectron2==0.2.1 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu92/torch1.6/index.html
    
    install
    python -m pip install detectron2==0.2.1 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu92/torch1.5/index.html
    
    install
    python -m pip install detectron2==0.2.1 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu92/torch1.4/index.html
    
    cpu
    install
    python -m pip install detectron2==0.2.1 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.6/index.html
    
    install
    python -m pip install detectron2==0.2.1 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.5/index.html
    
    install
    python -m pip install detectron2==0.2.1 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.4/index.html
    
    Source code(tar.gz)
    Source code(zip)
  • v0.2(Jul 8, 2020)

    Features & Improvements:

    • Support constructing objects with either configs or explicit arguments. As an example, the entire Mask R-CNN can be built without using configs
    • Rename TransformGen to Augmentation and keep TransformGen as an alias. Design the interface of Augmentation so that it can access arbitrary custom data types. See augmentation tutorial for details.
    • Improve speed of COCOEvaluator by about 3x
    • Support LVIS v1 dataset
    • Support GIoU loss in RPN and R-CNN
    • Support auto-scaling of batch size and learning rate in DefaultTrainer. See cfg.SOLVER.REFERENCE_WORLD_SIZE

    Pre-built Linux binaries are provided for the following environment:

    CUDA torch 1.5torch 1.4
    10.2
    install
    python -m pip install detectron2==0.2 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.5/index.html
    
    10.1
    install
    python -m pip install detectron2==0.2 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.5/index.html
    
    install
    python -m pip install detectron2==0.2 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.4/index.html
    
    10.0
    install
    python -m pip install detectron2==0.2 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu100/torch1.4/index.html
    
    9.2
    install
    python -m pip install detectron2==0.2 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu92/torch1.5/index.html
    
    install
    python -m pip install detectron2==0.2 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu92/torch1.4/index.html
    
    cpu
    install
    python -m pip install detectron2==0.2 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.5/index.html
    
    install
    python -m pip install detectron2==0.2 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.4/index.html
    
    Source code(tar.gz)
    Source code(zip)
  • v0.1.3(May 17, 2020)

    Bugfix version.

    We started to release pre-built wheels for multiple PyTorch versions:

    CUDA torch 1.5torch 1.4
    10.2
    install
    python -m pip install detectron2==0.1.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.5/index.html
    
    10.1
    install
    python -m pip install detectron2==0.1.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.5/index.html
    
    install
    python -m pip install detectron2==0.1.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.4/index.html
    
    10.0
    install
    python -m pip install detectron2==0.1.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu100/torch1.4/index.html
    
    9.2
    install
    python -m pip install detectron2==0.1.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu92/torch1.5/index.html
    
    install
    python -m pip install detectron2==0.1.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu92/torch1.4/index.html
    
    cpu
    install
    python -m pip install detectron2==0.1.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.5/index.html
    
    install
    python -m pip install detectron2==0.1.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.4/index.html
    

    Incompatible changes about internal interface:

    • _init_{box,mask,keypoint}_head of StandardROIHeads was changed from instance method to class method.
    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(May 4, 2020)

    The pre-built wheels for this version have to be used with an official binary release of PyTorch 1.5.

    Improvements:

    Incompatible changes:

    • When loading a checkpoint with resume_or_load(), training states like optimizer, start_iter will only be loaded when resume is True and the last checkpoint is found. This matches users’ expectations better
    • .output_size in custom box head is renamed to .output_shape
    • anchor_generator no longer duplicates the anchors for each image
    • feature_strides and feature_channels attributes are removed from ROIHeads. Use the input argument input_shape instead.

    New in DensePose:

    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(Mar 5, 2020)

    Incompatible changes about head design:

    • Mask head and keypoint head now include logic for losses & inference. Custom heads should overwrite the feature computation by layers() method.
    • _forward_{box,mask,keypoint} methods of StandardROIHeads now accept dict of features.

    This release is made to be compatible with such changes in projects (Mesh R-CNN, PointRend, etc)

    Other additional features:

    The pre-built wheels for this version have to be used with an official binary release of PyTorch 1.4.

    Source code(tar.gz)
    Source code(zip)
  • v0.1(Feb 7, 2020)

    Some major additional features since open source:

    We start to provide pre-built binary wheels at https://dl.fbaipublicfiles.com/detectron2/wheels/index.html. The pre-built wheels for this version have to be used with an official binary release of PyTorch 1.4.

    Source code(tar.gz)
    Source code(zip)
Owner
Facebook Research
Facebook Research
Python package for multiple object tracking research with focus on laboratory animals tracking.

motutils is a Python package for multiple object tracking research with focus on laboratory animals tracking. Features loads: MOTChallenge CSV, sleap

Matěj Šmíd 2 Sep 05, 2022
A tool to estimate time varying instantaneous reproduction number during epidemics

EpiEstim A tool to estimate time varying instantaneous reproduction number during epidemics. It is described in the following paper: @article{Cori2013

MRC Centre for Global Infectious Disease Analysis 78 Dec 19, 2022
Python scripts form performing stereo depth estimation using the HITNET model in ONNX.

ONNX-HITNET-Stereo-Depth-estimation Python scripts form performing stereo depth estimation using the HITNET model in ONNX. Stereo depth estimation on

Ibai Gorordo 30 Nov 08, 2022
An official repository for Paper "Uformer: A General U-Shaped Transformer for Image Restoration".

Uformer: A General U-Shaped Transformer for Image Restoration Zhendong Wang, Xiaodong Cun, Jianmin Bao and Jianzhuang Liu Paper: https://arxiv.org/abs

Zhendong Wang 497 Dec 22, 2022
Mask-invariant Face Recognition through Template-level Knowledge Distillation

Mask-invariant Face Recognition through Template-level Knowledge Distillation This is the official repository of "Mask-invariant Face Recognition thro

Fadi Boutros 35 Dec 06, 2022
The code for replicating the experiments from the LFI in SSMs with Unknown Dynamics paper.

Likelihood-Free Inference in State-Space Models with Unknown Dynamics This package contains the codes required to run the experiments in the paper. Th

Alex Aushev 0 Dec 27, 2021
[ECCV2020] Content-Consistent Matching for Domain Adaptive Semantic Segmentation

[ECCV20] Content-Consistent Matching for Domain Adaptive Semantic Segmentation This is a PyTorch implementation of CCM. News: GTA-4K list is available

Guangrui Li 88 Aug 25, 2022
Official implementation of "CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding" (CVPR, 2022)

CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding (CVPR'22) Paper Link | Project Page Abstract : Manual an

Mohamed Afham 152 Dec 23, 2022
Code for CMaskTrack R-CNN (proposed in Occluded Video Instance Segmentation)

CMaskTrack R-CNN for OVIS This repo serves as the official code release of the CMaskTrack R-CNN model on the Occluded Video Instance Segmentation data

Q . J . Y 61 Nov 25, 2022
In this project we combine techniques from neural voice cloning and musical instrument synthesis to achieve good results from as little as 16 seconds of target data.

Neural Instrument Cloning In this project we combine techniques from neural voice cloning and musical instrument synthesis to achieve good results fro

Erland 127 Dec 23, 2022
Serving PyTorch 1.0 Models as a Web Server in C++

Serving PyTorch Models in C++ This repository contains various examples to perform inference using PyTorch C++ API. Run git clone https://github.com/W

Onur Kaplan 223 Jan 04, 2023
A PyTorch implementation of the baseline method in Panoptic Narrative Grounding (ICCV 2021 Oral)

A PyTorch implementation of the baseline method in Panoptic Narrative Grounding (ICCV 2021 Oral)

Biomedical Computer Vision @ Uniandes 52 Dec 19, 2022
A deep learning library that makes face recognition efficient and effective

Distributed Arcface Training in Pytorch This is a deep learning library that makes face recognition efficient, and effective, which can train tens of

Sajjad Aemmi 10 Nov 23, 2021
Flask101 - FullStack Web Development with Python & JS - From TAQWA

Task: Create a CLI Calculator Step 0: Creating Virtual Environment $ python -m

Hossain Foysal 1 May 31, 2022
This is the official code release for the paper Shape and Material Capture at Home

This is the official code release for the paper Shape and Material Capture at Home. The code enables you to reconstruct a 3D mesh and Cook-Torrance BRDF from one or more images captured with a flashl

89 Dec 10, 2022
Implementation of Monocular Direct Sparse Localization in a Prior 3D Surfel Map (DSL)

DSL Project page: https://sites.google.com/view/dsl-ram-lab/ Monocular Direct Sparse Localization in a Prior 3D Surfel Map Authors: Haoyang Ye, Huaiya

Haoyang Ye 93 Nov 30, 2022
A Human-in-the-Loop workflow for creating HD images from text

A Human-in-the-Loop? workflow for creating HD images from text DALL·E Flow is an interactive workflow for generating high-definition images from text

Jina AI 2.5k Jan 02, 2023
CTF challenges from redpwnCTF 2021

redpwnCTF 2021 Challenges This repository contains challenges from redpwnCTF 2021 in the rCDS format; challenge information is in the challenge.yaml f

redpwn 27 Dec 07, 2022
A basic implementation of Layer-wise Relevance Propagation (LRP) in PyTorch.

Layer-wise Relevance Propagation (LRP) in PyTorch Basic unsupervised implementation of Layer-wise Relevance Propagation (Bach et al., Montavon et al.)

Kai Fabi 28 Dec 26, 2022
Learning to Prompt for Vision-Language Models.

CoOp Paper: Learning to Prompt for Vision-Language Models Authors: Kaiyang Zhou, Jingkang Yang, Chen Change Loy, Ziwei Liu CoOp (Context Optimization)

Kaiyang 679 Jan 04, 2023